Quick indexing
We aim to achieve not only efficient data storage but also rapid data retrieval speeds. The majority of data providers offer ready-to-use data sets or pre-parsed base data sets, constraining users to their binary structure parsing capabilities and imposing a learning curve related to data mapping. This approach often stems from the prohibitive costs associated with processing raw binary data and the resources necessary for constructing these data sets. Direct data fetching from RPC or utilizing cloud storage, which incurs charges for every IO operation, can significantly escalate costs.
It is crucial to ensure unrestricted access to foundational unstructured data, mitigating concerns about the number of iterations required to refine the data. We observe numerous teams resorting to the development of data offloaders for their private infrastructure to emulate similar capabilities. This endeavor can be particularly challenging, especially for high throughput chains like Solana, leading teams to expend weeks, or even months, on tasks that our solution aims to reduce to mere minutes.
At DH3, we have devoted additional effort to ensure that our core methodologies are scalable and cost-efficient. We have chosen to store all unstructured data in Hadoop, employing our ETL powered by a custom Kubernetes operator that oversees all consumer and data processor activities. Our current infrastructure allows for data rescans at an extraordinary speed of 50 million transactions per second (Tps) at a throughput of 30GBps. Notably, this capacity is not the upper limit, as DH3.io's infrastructure is designed for near-linear scalability. Our entire system is deployed on bare metal, optimizing operational efficiency while eliminating additional IO or network expenses. Impressively, our system can rescan all Solana transactions within a range of 10 to 30 minutes.
Access to our rapid indexing capabilities is provided through either the Backfilling API or the GUI Builder.
Last updated