ML Earthquake Declustering | Md Ashraf

01 · The Challenge

Understanding Earthquake Declustering

Separating independent earthquakes from dependent aftershocks

🌍

What is Declustering?

Earthquake catalogues contain background events (independent, random tectonic stress release) and dependent events (aftershocks triggered by previous earthquakes). Declustering separates these populations to enable accurate hazard assessment and earthquake forecasting.

Background seismicity follows a Poisson process, while clustered events exhibit strong spatiotemporal dependencies.

📏

Fixed Parameters

Traditional methods use predetermined windows that don't adapt to regional variations.

🔀

Overlapping Clusters

Complex sequences overlap, making separation difficult with rule-based approaches.

🎯

Subjective Tuning

Results depend heavily on threshold choices, reducing reproducibility.

📊

Statistical Bias

Model assumptions often violated by real-world catalogue incompleteness.

🤖

Machine Learning Solution

Pattern Recognition

Automatically learns complex nonlinear relationships without fixed rules

Adaptive Learning

Generalizes across regions without manual parameter tuning

Multidimensional Analysis

Captures time-space-magnitude relationships simultaneously

Objective Classification

Reproducible results based on learned patterns from synthetic training data

02 · Methodology

Research Framework

Six-stage machine learning pipeline

Data Acquisition

GeoNet earthquake catalogue (1980-2024) · 396,267 events · Quality control and preprocessing

NND Analysis

Nearest-neighbour distance computation · Rescaled space-time metrics · Bimodal distribution analysis

ETAS Simulation

Synthetic catalogue generation · Labeled background/triggered events · Training data preparation

Feature Engineering

Extract T⁺, R⁺, dm⁺, N⁺, n_parent, n_child · Temporal, spatial, magnitude features

Model Training

Random Forest, SVM, Gradient Boosting, XGBoost · 5-fold cross-validation · Hyperparameter optimization

Deployment

Apply best model (XGBoost) to real catalogue · Validate against historical sequences

ML Models

RF Random Forest

SVM Support Vector

GB Gradient Boost

                            XGB
                            XGBoost ⭐
                        

Features

T⁺ R⁺ dm⁺ N⁺ n_parent n_child

03 · Visualizations

Research Plots & Analysis

Key visualizations from the study

Model Performance Comparison

Figure 1

Accuracy, Precision, Recall, and F1-Score comparison across Random Forest, SVM, Gradient Boosting, and XGBoost models.

Feature Importance Ranking

Figure 2

XGBoost feature importance showing N⁺ (siblings count) as the most influential predictor, followed by R⁺ (rescaled distance) and T⁺ (rescaled time).

Classification Distribution

Figure 3

Distribution of background events (58.23%) vs triggered events (41.77%) from XGBoost classification of New Zealand catalogue.

Temporal Evolution of Seismicity

Figure 4

Background and triggered events over 44 years (1980-2024) showing major earthquake sequences: Edgecumbe (1987), Canterbury (2010-2011), Kaikōura (2016).

Confusion Matrix

Figure 5

XGBoost confusion matrix on synthetic test data showing 98.7% True Positive rate and 94.4% True Negative rate.

Spatial Distribution Map

Figure 6

Spatial distribution of background and triggered events across New Zealand, showing alignment with major tectonic structures.

NND Distribution

Figure 7

Bimodal distribution of nearest-neighbour distance (log η) showing clear separation between background and clustered events.

ROC Curve Analysis

Figure 8

Receiver Operating Characteristic curve demonstrating model discrimination capability with AUC = 0.98.

Detailed Performance Metrics

Model	Accuracy	Precision	Recall	F1-Score
XGBoost	97.44%	97.66%	98.74%	98.20%
Gradient Boosting	97.11%	97.06%	98.89%	97.97%
Random Forest	96.72%	96.22%	95.15%	97.91%
SVM	94.36%	94.48%	94.36%	94.40%

04 · Results

Key Findings

Application to New Zealand earthquake catalogue

🎯

97.44%

Classification Accuracy

XGBoost model on synthetic test data

🟢

230,758

Background Events

58.23% of total catalogue

🔴

165,509

Triggered Events

41.77% aftershocks identified

Feature Importance

N⁺ — Siblings Count

Most influential predictor of aftershock clustering

R⁺ — Rescaled Distance

Spatial proximity strongly indicates triggering

T⁺ — Rescaled Time

Temporal correlation with Omori-Utsu decay

Comparison with Traditional Methods

Gardner-Knopoff

75% BG

NND Threshold

62% BG

XGBoost (This Study)
58.23% BG

<<<<<<< HEAD

Resources

→ View GitHub Repository 📄 Download Thesis Report ← Back to Portfolio =======

05 · Scientific Impact

Research Contributions

Implications for seismology and hazard assessment

✓

High Accuracy

97.44% classification accuracy demonstrates reliable pattern recognition with minimal misclassification

✓

Objective Method

Eliminates subjective parameter tuning, providing reproducible results across regions

✓

Physical Validation

Results align with known tectonic structures and historical earthquake sequences

✓

Improved Sensitivity

Captures subtle clustering patterns that fixed-parameter approaches miss

Seismic Hazard Assessment

More accurate background rates for PSHA models
Improved building code development
Better insurance risk modeling
Enhanced emergency preparedness

Earthquake Science

Understanding fault interaction mechanisms
Statistical forecasting model improvement
Aftershock sequence evolution insights
Physics-based simulator foundations

>>>>>>> cf03a3d (Updated About section, Portfolio projects, ONGC internship)

Machine Learning Approach For Earthquake Declustering