Disaster recovery

Disaster Recovery Plan for DH3.io (Data Hub for Web3) Regarding Solana Chain Reorganizations

Introduction

This disaster recovery plan details the steps to be taken in response to reorganizations (reorgs) and disruptions within the Solana blockchain network as they affect DH3.io operations. The aim is to ensure a systematic and reliable approach to managing potential inconsistencies and maintaining the integrity of both real-time streams and historical data within the DH3.io ecosystem.

Regular Monitoring Procedures

Validator Consistency Check: Every 5 seconds, a consistency verification is carried out among our validators to ensure they are in agreement on slot positions and block hashes.
External RPC Verification: At 30-second intervals, an additional check with a randomly selected RPC from our list of trusted sources (e.g., QuickNode) is performed to confirm data consistency with external networks.

Disaster Recovery Plan for Real Time Streams and Historical Database

Initial Response:
- In the event of detected inconsistencies, the Kubernetes operator will halt all related instances to prevent the spread of erroneous data.
Issue Assessment:
- Determine whether the inconsistency is due to a global blockchain event or localized to our validators.
Localized Event Handling:
- If the issue is found to be local, the affected validator is removed from the pool.
Identifying Last Correct Block ID:
- Establish the most recent valid block ID before the discrepancy was detected.
Data Correction:
- Delete incorrect records from HBase (RPC) and HDFS archives. Thanks to our LiteRPC, this process allows for the straightforward rollback and deletion of inaccurate data.

Disaster Recovery Plan for User Generated Data Sets

Data Set Generation Delay:
- For most scenarios, it is recommended to delay the generation of data sets for 5-10 minutes. This delay, combined with the cessation of all related downstream processes and jobs by our Kubernetes operator, usually suffices to address any inconsistencies.
Real-Time Data Set Handling:
- For real-time data sets, the same corrective measures are applied as with the historical database, involving the deletion of erroneous blocks of data.
Data Integrity and Duplication Prevention:
- Each record within the data sets is assigned a unique ID, derived from the block timestamp, transaction position within the block, and several other characteristics. This unique identifier helps quickly eliminate unwanted data and acts as a measure against data duplication.

Conclusion

This disaster recovery plan is crucial for maintaining the operational integrity and reliability of DH3.io in the face of Solana blockchain reorganizations. By following these procedures, DH3.io ensures the consistency, accuracy, and security of both the real-time and historical data within its Web3 data hub.

PreviousOther

Last updated 1 year ago