High Performance Data Indexing Tool (2022)

Proponent: 5D5TJ7cNZWuRgXYe4oxizBqWG31eJ1iFpY24JU2jzDccEiwY

Date: 10 April 2023

Requested DOT: 4201.0828

Short description: Ongoing costs for 2022, for the running of public-good SubQuery Dictionary Projects which serve pre-computed indexes of each block for most Polkadot and Kusama parachains. These public projects enable the parachain teams and teams within their ecosystem to increase the data indexing speed of their SubQuery projects by up to 10 times.

Full Proposa: https://docs.google.com/document/d/1SjoBgDzX-fJmVWLXQPhQlpy29oimRbHD42tQjGdqgdY/edit?usp=sharing

Raw Data: https://docs.google.com/spreadsheets/d/17829KgXp5aXKXkaPw7A7oCgjMBB8vEuCg2Pyk1MGQnY/edit#gid=0

Original Motivation

SubQuery is the universal data indexing toolkit facilitating the construction of Web3 applications of the future. A SubQuery project is a complete API to organise and query data from Layer-1 chains. Currently servicing Polkadot, Kusama, and other layer-1 chains in Substrate, Avalanche, Algorand, Terra, NEAR, and Cosmos projects, we provide data for developers to use for a wide array of projects (wallets, explorers, custom chains, or any other decentralised app). In the future, the SubQuery Network intends to replicate this scalable and reliable solution in a completely decentralised manner.

Throughout the whole of 2022 (01/01/2022 - 31/12/2022) we've provided public SubQuery Projects, termed Dictionaries, for most parachains in the Polkadot and Kusama ecosystem to dramatically increase indexing performance (up to 10x faster) and so far these dictionaries have served over 1 billion requests.

A SubQuery dictionary is a special SubQuery Project which scans over the network, and records metadata of every event and extrinsic on each block. When other SubQuery projects index data on any of these chains, they can ask the dictionary what blocks a filtered list of events/intrinsics appear on. Rather than having to search through each block, dictionaries reduce the amount of data that the indexer obtains from the chain, reduces load on free RPC endpoints across the ecosystem, and reduces the number of "unwanted" blocks stored in the local buffer. For example, instead of SubQuery inspecting each block of Polkadot's massive chain (600GB of unstructured data on over 12 million blocks) which can take many hours, dictionaries skip through to the exact right blocks.

Adding the dictionary  to optimise indexing performance is as easy as adding the relevant network dictionary endpoint to your SubQuery Project. This has been added by default to all of our starter projects to make it default to use.

Here you can find detailed documentation on how dictionaries work: https://academy.subquery.network/academy/tutorials_examples/dictionary.html#

SubQuery Dictionaries have been created for most parachains to the public here https://github.com/subquery/subql-dictionary and have been deployed to our Managed Service. If any other team would like a dictionary created - we are more than happy to help out.

Future Targets for Our Dictionaries

  • We're proud of our record on stability and reliability. One of our core values is to be blockchain's most reliable indexer, and we don't intend to lose sight of this as we onboard more chains and expand the number of customers relying on our hosted service for their production applications. Our goal for 2023 is 99.99% uptime, we're starting off strong with 100% so far as seen here.

  • Through 2022 we ran 24 dictionaries for the in-demand parachains that we believed would add most value to the dApp developers in the Polkadot and Kusama ecosystems. Moving forward we want to be able to offer the increased indexing performance to all active block-producing parachains, our goal will be to add 20 more dictionaries to our suite of tools in 2023.

  • Being on the cusp of 1 billion requests to our dictionaries in 2022 has left us hungry to smash records in 2023. This year, with the launch of our decentralised SubQuery Network and ongoing Business Development efforts, we are looking to hit (and surpass) a target of 10 billion requests to our dictionaries.

  • Github Contributors. Our dictionaries are completely open source, and SubQuery is a community project with a goal of complete decentralisation. Already last year, we welcomed contributions to our dictionaries from developers at Nodle and Subscan. We are working on making it easier for new parachains to assist us with creating dictionaries.

Service Details (Polkadot)

Between 2022-01-01 and 2022-12-31 our Polkadot Dictionaries have:

  • Served an extraordinary total of 733,069,683 (over 700 million) requests

  • The Polkadot Dictionary alone served 630,818,299 (over 630 million) requests

  • The highest daily total of the Polkadot Dictionary was 19,219,650 requests in a single 24 hour period occurring on 22nd July 2022

Maintained availability at >99% uptime via our SubQuery Managed Service (see uptime monitoring here)

Full proposal found here: https://docs.google.com/document/d/1SjoBgDzX-fJmVWLXQPhQlpy29oimRbHD42tQjGdqgdY/edit?usp=sharing

There are no comments here