Streamlining Data Access through Categorized Channels

In the realm of blockchain data management, streamlining the process of building datasets is crucial for enhancing efficiency and reducing the computational overhead involved in data retrieval. One effective strategy to achieve this involves categorizing data into various channels. This approach not only simplifies data access but also significantly cuts down on the number of read operations required.

Understanding Block and Transaction Streams

Block streams are tailored to operate on a per-network basis, such as mainnet and devnet. This means that each network has a dedicated channel, streamlining access to its block data. On the other hand, more granular data types, such as transactions and accounts, are divided into multiple streams. These categorizations are designed with specificity in mind; for instance, separating oracle-related transactions from general transaction streams or distinguishing token accounts from more general account streams. This level of organization facilitates targeted data retrieval, making it easier for developers to access the precise information they need without wading through irrelevant data.

A byproduct of creating these separate channels is the inevitability of data duplication. Given that a single transaction may interact with multiple programs and, therefore, belong to more than one category, it might appear in several streams. While at first glance, duplicating data across channels could seem like a costly strategy in terms of storage, it's a trade-off that pays dividends.

For the ecosystems we're contributing to and the data providers integrating with our solutions, handling duplicated data isn't a notable concern. In fact, it's a small price to pay compared to the alternative. Processing vast quantities of unrelated records places a significant strain on network resources and CPU usage. When compared, the cost of additional storage is minimal, especially given the efficiency gains in data retrieval and the consequent reduction in processing overhead.

A bit of Storage Philosophy

In an environment where the specifics of decoding and parsing binary data evolves daily, it's crucial for us to adapt and align our strategies accordingly. All data within our channels is stored in its original, unparsed form. This approach underscores a foundational principle of our operation—parse data on-the-fly.

Relying on real-time parsing allows us to ensure that data sets are reconstructed as swiftly as possible, utilizing the most up-to-date techniques and knowledge derived from the latest updates and contributions within the DH3 community. This method not only maximizes efficiency and speed but also ensures that our datasets are enriched using the freshest insights, keeping us and our developers at the leading edge of data handling practices.

Current channels



Full mainnet blocks



Full devnet blocks



Oracles, Pyth, ChainLink.



Transactions that failed to execute, are still relevant data for some, as they can reduce lamports or fail just partially.



Transactions that do not fall into any other category, for now, this is the most relevant stream as it contains the majority of transactions



TokenProgram owner accounts



SystemProgram owner accounts



All other accounts that do not fall in specific channels



Full mainnet blocks with enriched traces.






Transactions that do not fall into any other category



Mempool transactions are only applied with real-time streams.


Last updated