Seismotectonics Considered Artificial Neural Network Earthquake Prediction in Northeast Seismic Region of China

It is well known that earthquakes are a regional event, strongly controlled by local geological structures and circumstances. Reducing the research area can reduce the influence of other irrelevant seismotectonics. A new sub regiondividing scheme, considering the seismotectonics influence, was applied for the artificial neural network (ANN) earthquake prediction model in the northeast seismic region of China (NSRC). The improved set of input parameters and prediction time duration are also discussed in this work. The new dividing scheme improved the prediction accuracy for different prediction time frames. Three different research regions were analyzed as an earthquake data source for the ANN model under different prediction time duration frames. The results show: (1) dividing the research region into smaller subregions can improve the prediction accuracies in NSRC, (2) larger research regions need shorter prediction durations to obtain better performance, (3) different areas have different sets of input parameters in NSRC, and (4) the dividing scheme, considering the seismotectonics frame of the region, yields better results.


INTRODUCTION
Earthquakes are among the most terrifying natural disasters and have seized hundreds of thousands of lives as well as destroyed millions of buildings.Statistically, 15 earthquake events have respectively caused more than 20,000,000 people's deaths throughout history.Although other natural disasters can be well predicted such as hurricanes [1], typhoons [2], landslides [3], or volcanic eruptions [4], earthquake prediction is difficult, but some scientists are trying to find methods to do it.There have been numerous efforts using different applications during the past few decades [5].The earliest earthquake prediction parameters date back to more than 70 years ago [6].Unfortunately, the progress in this field is very slow and until recently there were still no successful methods to satisfy the requirements described by Allen (1982) [7], which included predictions of when, where, how big and how probable an impending earthquake would be.
The reason for this failure is that the occurrence of earthquakes is a very complex process influenced by a large number of factors for which the effects are still not exactly understood.Traditional statistical and mathematical methods are not easily able to analyze such complex processes and overcome the distorted control force due to time-delay [8], The artificial intelligence technique seems a feasible method to solve this issue, due to its powerful ability to process complex variable data and non-linear variable data.The artificial intelligence earthquake prediction model was developed and continues to arouse attention.Nevertheless, the ANN stands out from other artificial intelligent techniques due to its strong non-linear fitting capabilities that can be mapped to any complex non-linear relationship [9].ANN also has simple usage instructions and easy access to computer implementation.The resilience, large memory capacity, strong non-linear mapping ability and self-learning capability enable the neural network to process a large amount of complex non-linear data.
The first ANN model for earthquake prediction was modified by a financial market forecasting model [10].This model includes three inputs: time, intensity and location.These three inputs are known as the three basic elements of an earthquake.This model correctly predicted two earthquakes using major groups 1° in longitude in the Azores with a range of ±5-6 months [10].Three different ANN models used a novel set of seismicity indicators which were applied to forecast the earthquake magnitude for southern California and San Francisco bay.These ANN models achieved acceptable results for the earthquake of a magnitude between 6.0 and 7.5 [11].What is remarkable about their work is that they compared the prediction accuracies with a recurrent neural network, a radial basis function neural network and a Levenberg-Marquardt back-propagation neural network.The recurrent neural network yielded the best prediction result so they tried to predict the time and location of an earthquake in southern California by recurrent neural network [12].Probabilistic neural network, also a type of ANN, was used to forecast the earthquake in southern California and presented good prediction accuracies for earthquakes of magnitudes between 4.5 and 6.0 [12].The radial basis function neural network yielded more accurate and effective prediction re-sults than adaptive neural fuzzy, a kind of artificially intelligent method, in South Iran [13].The ANN's earthquake prediction models were used in many other regions such as Greece [14], the northern Red Sea area [15], Chile [16,17] and East Anatolian fault region [18].
There have been many efforts to find the methods to promote the prediction accuracies of ANN's earthquake prediction model during the past years.Many kinds of ANN methods were compared to search for the rational method to build the earthquake model [5,12,13,19].Seismicity indicators were analyzed to choose the rational input parameters [11,16] and to consider the monitoring data [18,14,20].The best set of seismicity indicators to predict earthquakes were also discussed [16].The prediction range and period were considered [12].These studies indicated that there are two methods to improve earthquake prediction: (1) reducing the prediction time frame from one month to twenty two days and (2) dividing the seismic regions into smaller areas and performing a parametric study.However, they just divided the seismic region into regular smaller areas, according to coordinates, without considering the seismotectonic frame of the region.It is possible this destroyed the integrity of the earthquake data and intermingled the relationship between the earthquake and the tectonic unit because the earthquake is a geological event controlled by local geo-structure and has an inner-relationship with the tectonic unit [21].
This work is based on the three previous works [11,16,17] and uses Recurrent Neural Network (RNN) to predict earthquakes in the northeast seismic region of China (NSRC in brief).This work will apply feature selection techniques to obtain a better set of features for NSRC from 16 seismicity indicators which were discussed by Panakkat&Adeli (2007) [11] and Reyes et al. (2013) [17], as ANN's input parameters following the method from Martínez-Álvarez et al. (2013) [16].The accuracies of the earthquake predictions are improved by dividing the research region into rectangular subregions and tectonic units.The earthquake prediction accuracies caused by the two different fragment schemes are discussed.The result shows: (1) dividing the research region into sub-regions can improve the prediction accuracies in NSRC, (2) bigger research regions need shorter prediction duration to obtain better performance, (3) different areas have different improved sets of input parameters in NSRC, (4) the dividing scheme considering the seismotectonics frame of the region yields better results than others.This result indicates that the occurrences of earthquakes rely on the geological structure but the regional seismotectonics frame must also be considered during the procedure of the earthquake data processing.

BACKGROUND INFORMATION
This section serves to introduce some background information about this work, such as the research region (NSRC), the smaller research region divided by regular coordinates and by seismotectonics frame, and the data source of this research.

NSRC
The northeast seismic region of China, which consists of Heilongjiang Province, Jilin Province and some parts of Inner Mongolia Autonomous Region and the Liaoning Province.Considering tectonic unit boundary, earthquake positioning error and earthquake catalog integrity the NSRC also includes a little part of Russia, Korea and Mongolia, shown in Fig. (1).The history records no M 7.0 earthquake events occurred in this region and the seismic activity in this region was relatively weak not only in intensity but also in quantity in the past.A few studies have been made regarding its activity and potential seismic risks in the future [22].However, since 2013, 7 events of magnitude from M 5.0 to M 5.9 have occurred in this region.This indicates that the NSRC is entering its earthquake-active period that was influenced by the western Pacific plate subduction exacerbation, which was manifested by the Tohoku M 9.0 event, 2011 and Okhotsk Sea M8.  3)) of NSRC also show a significant enhancement in seismic activities in 2013.Statistics show the majority of earthquakes were M 5.0 in NSRC with a magnitude of M5.0-M5.9, and only 4 events were more than M6.0, the biggest was about M6 3 / 4 .Although earthquakes with magnitude of M5.0-M5.9 may not directly cause the deaths of civilians, it can cause suffering to people as well as huge financial loss.An example of financial loss is the Songyuan M 5.8 earthquake Sequence, 2013 affected 60,000 residents, seriously damaged 16,000 houses, 40,000 buildings somewhat damaged, 310 houses collapsed and caused the economic loss of 2 billion RMB (about 300 million US dollars).Furthermore, these earthquakes will generate social problems such as rumor or panic.An earthquake of this magnitude has a higher probability of occurence than a M 6.0 one.So we will focus on predicting the earthquakes with a magnitude of M 5.0-M 5.9 in the NSRC.

Research Region Dividing
Dividing the seismic region into smaller areas can promote the prediction accuracies of ANN's earthquake prediction model [12].In this work, the NSRC was also divided into rectangular research regions and tectonic units.The NSRC has relatively low seismicity intensity and frequency.In order to ensure sufficient earthquake data, regular rectangular research regions were divided into 2° 2°and four regions were randomly selected as research areas.These are named REC-1 (between 121°E and 123°E longitude and 47°N and 49°N latitude), REC-2 (between 129°E and 131°E longitude and 47°N and 49°N latitude), REC-3 (between 118°E and 120°E longitude and 44°N and 46°N latitude) and REC-4 (between 123°E and 125°E longitude and 44°N and 46°N latitude).Four tectonic units were also randomly selected as research areas.These are named as TEC-1 (Daxinganling uplift block), TEC-2 (central depression of Songliao basin), TEC-3 (Jiamusi uplift block) and TEC-4 (Southern uplift of Songliao basin) (Fig. (1)).The four rectangular regions and four tectonic units were studied and predicted as research areas and the prediction accuracies are discussed in Section 5.

Data Source
The earthquakes in NSRC can be divided into two categories based on focal depth: the first is the shallow earth-  quake (the focal depth＜60km), and the second is the deep earthquake (the focal depth＞300km).In this work, when it comes to earthquakes we just consider the shallow earthquakes.The main seismicity catalog used in this work was derived from 2 resources.The data of moderate to strong earthquakes (the magnitude is equal to or larger than M 4.7) was derived from the following: China Earthquake Catalogue (Monitoring and Forecasting Division of Earthquake Administration of China, 2010), Modern China Earthquake Catalogue (Earthquake Disaster Prevention Division of Earthquake Administration of China,1999), Northeast Historical Earthquake Series (Seismological Press, 1992) and China Earthquake Details Catalogue (Earthquake Analysis and Prediction center of Earthquake Administration of China,1970-2013).Then, the data of small earthquakes (the magnitude is smaller than M 4.7) was derived from instrumental records which are recorded in China Earthquake Details Catalogue (Earthquake Analysis and Prediction Center of Earthquake Administration of China,1970-2013) and referred to the earthquake catalog of the Earthquake Administration of Heilongjiang Province, Jilin Province, Liaoning Province and Inner Mongolia Autonomous Region.

METHODOLOGY
In this section, the earthquake prediction model, the methods of determining the best set of input parameters and the calculation results assessment are briefly introduced.

Earthquake Prediction Model
Martínez-Álvarez et al. (2013) [16] had compared the prediction accuracy of the artificial neutral network with Naive Bayes (NB), K-nearest neighbors (KNN) and supportvector machines (SVM).The comparison result showed ANN had better performance than any other classifier.This work builds the earthquake prediction model by ANN.Many researchers have discussed and described the concept of ANN in details [8,23,24].We will not introduce the concept of BPN here, but focus on presenting how to build the ANN earthquake prediction model.The model is composed of three layers: the input layer, hidden layer and output layer.The input layer has 7 input indicators, the hidden layer has 15 neurons and the output layer has 2 output parameters.Firstly, there are 16 alternative parameters for input set, derived from two previous research papers [12,17], Input=[x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , T, M mean , dE 1/2 , , a, M, μ, c, ].The first 7 parameters came from Panakkat&Adeli (2009) [12], and the next 9 parameters were obtained from Reyes et al. (2013) [17].For detailed information about these parameters please refer to these research papers.The best input set of every research region, which includes 7 parameters was obtained by Weka software(See section 3.2).The output includes 2 parameters: one is the maximum magnitude M max observed in the next 30 days and the other is the time t (measured by days) recorded the temporal duration when the maximum earthquake occurred.The Weka software was used to build and calculate the previously mentioned model.

Better Set of Features
Martínez-Álvarez et al. (2013) [16] applied the Weka software to measure the information gain associated with each feature with respect to the class, and discussed the significance of choosing the rational input parameters.It is the first time the Weka software has been applied for earthquake prediction [16].We adopted this method in our work to choose the best input parameters of every research region (Table 1), the details of this method are proposed in [16].

Evaluating Method
The assessment methods are modified from past research [11,16,17,19].First of all, several important parameters have to be introduced: 1. Double true (DT).The number of times that an impending earthquake was properly predicted in magnitude and time.

Magnitude true (MT).
The number of times that the ANN model was properly predicted an earthquake only in magnitude.

Time true (TT).
The number of times that the ANN model was properly predicted an earthquake only in time.

Double false (DF).
The number of times that the prediction result was wrong in magnitude and time.The rate of perfect prediction result (denoted by P DT ), the rate of proper magnitude prediction (denoted by P M ), the rate of proper time prediction (denoted by P T ), and the rate of entirely false prediction result (denoted by P DF ) are calculated by the equation below: Additionally, the chi-square statistic test was applied to check the prediction accuracy difference (denoted by P C ) between the different research region definition criterions.For the detailed concept of the chi-square statistic test can refer to Moor (1976) [25].
Finally, these parameters are comprehensively compared to find out which research region definition criterions can yield better prediction accuracy.

CALCULATION RESULTS
This section serves to present the modeling result of the ANN earthquake prediction model by different research region definition criterions: the entire region (presented by NSRC), the smaller rectangular regions (presented by REC-1 to REC-4), and the smaller seismotectonics unit regions (presented by TEC-1 to TEC-4).The prediction time frame was also considered in this work, it changed from one month (Table 2) to 15 days (Table 3) and 7 days (Table 4).

DISCUSSION
This section is to summarize and compare the data and tables from previous sections.It also serves to show the prediction performance of different research region definition criterions and prediction time frames.
Table 1 lists the best set of input indicators of the research regions analyzed.It shows the research regions have different best sets of input features.It indicates earthquake occurrence has regional characteristics, because it depends on local special and complex circumstance.
Tables 2-4 show the prediction performance for research regions with the prediction time frame from one month (Table 2) to 15 days (Table 3) and 7days (Table 4).For NSRC, the 7 days prediction time frame yields better prediction accuracy than either 30 days or 15 days.The result is in line with the previous research [12] about how reducing the prediction time frame can improve the prediction accuracy to some extent.However, the smaller research regions are not able to meet this conclusion.The rectangular sub-regions produced better results when the prediction duration is 30 days, while the seismotectonics unit smaller sub-regions need 15 days of prediction duration.The reason is that the mean time between typical events is relatively longer than the prediction duration due to the fact that smaller subregions have more mean time between typical events.In other words, the 15 days and 7 days prediction time frames are considerably shorter than the duration of a typical earthquake of the rectangular sub-regions and the seismotectonics unit sub-regions, respectively.The result of this work is in accordance with the research of Panakkat&Adeli (2009) [12], which suggested that dividing the seismic region into smaller areas could yield better prediction accuracy.Both the regular smaller sub-regions and the seismotectonics unit smaller sub-regions yield obviously better results than the large whole region, according to P C < by chi-square statistic test.The Table 2 to Table 4 also show that under different prediction time frames the smaller research regions produced better results than the NSRC and the seismotectonics unit smaller sub-regions have higher accuracy than the regular smaller sub-regions.Although, the prediction accuracy differences between the rectangular sub-regions and the seismotectonics unit subregions are not significant, a small increase can still be observed.The seismotectonics unit sub-regions have higher accuracy than the regular smaller sub-regions, in spite of the seismotectonics unit sub-regions have larger area.This effect can be explained by the fact that the earthquake is a geological event controlled by local geo-structure and has an innerrelationship with the tectonic unit [21].It is well known that the distribution of earthquakes is uneven and controlled by local geo-structure so the regional seismotectonics frame must be considered during the procedure of the earthquake data processing.
The geological structures have analogous characteristics and an activity rhythm in the same seismotectonics unit.The different seismotectonics units have different seismicity rhythms controlled by their respective geological structures characteristics and activity rhythm.The large research region is composed of several seismotectonics units; the different seismicity rhythms of these seismotectonics units interfere and influence each other, which lead to a relatively inaccurate prediction result.Dividing the research region into smaller areas can reduce above mentioned interference and influence by reducing the involved number of seismotectonics.However, dividing the research regions into rectangular smaller areas cannot reduce this interference and influence due to seismotectonics units not being rectangular in shape.The rectangular smaller sub-regions consist of less seismotectonics units than the large region but they normally involve two or more seismotectonics units.Dividing the large region into seismotectonics units can reduce these impacts to a minimum by ensuring a uniform seismicity rhythm in the smaller areas and generates higher accuracy.

CONCLUSION
Earthquake prediction, a well-accepted and difficult issue, is deserving of our efforts to explore due to the fact there is still a lot of space for improvement.The authors follow the previous researches [11,12,16,17,19,26] to promote the earthquake prediction accuracy of the ANN model, and to find out the seismotectonics model considered to generate better results because earthquakes are a regional geological event and strongly depend on local geological circumstance.In general, the following conclusions can be drawn from this work: 1.Under the premise of adequate seismicity data, dividing the research region into smaller sub-regions can improve the prediction accuracies in NSRC.
2. Shortening the prediction duration can improve the prediction results but the prediction duration must be longer than the mean time between typical events.That is to say larger research regions need shorter prediction duration to obtain better performance, because larger regions have more earthquake data and relatively shorter mean time between typical events.
3. Earthquakes are regional geological events and have the regional characteristics of the region.This leads to the regional seismicity characteristics.Different areas have different improved sets of input parameters due to regional seismicity characteristics.
4. Dividing the large region into seismotectonics units can reduce the interference and influence from other seismotectonics units and promote better prediction accuracy.So the dividing scheme considering the seismotectonics frame of the region yields better results than the others of options.