Several months ago, Subcoin, a Rust Bitcoin Full Node implementation, received a grant from the Web3 Foundation as a showcase of the polkadot-sdk
framework. The project aimed to implement a decentralized Bitcoin fast sync mechanism, enabling Bitcoin users to quickly sync with the network tip by directly downloading the Bitcoin state (UTXO set) from the Subcoin P2P network in a decentralized manner.
However, limitations in the Substrate framework prevented the realization of this vision. The primary issue was Substrate's inability to import large states (several gigabytes or more) downloaded during state sync without encountering Out-of-Memory (OOM) errors. As a result, we could only demonstrate the Bitcoin fast sync functionality by syncing to an intermediate block height, falling short of the original objective.
Since completing the Subcoin w3f grant, I have been actively working to address the challenges related to state sync. Through extensive investigation, we identified the root cause of the OOM issue: Substrate attempts to construct the entire trie in memory during the state import process (Issue #5053).
With guidance from the polkadot-sdk
maintainers, a rough plan to resolve the OOM issue and introduce persistent state sync support has been formulated. Preliminary efforts have already yielded encouraging results, laying the groundwork for a comprehensive solution.
Based on the progress made so far, I submitted a follow-up grant proposal to the w3f grants program. However, after careful consideration, the reviewers recommended the treasury funding route to better support this work (https://github.com/w3f/Grants-Program/pull/2436#issuecomment-2479329146).
The current fast sync feature in Substrate suffers from significant limitations, making it unusable or impractical in certain scenarios:
OOM issue on importing the large state:
Substrate fails to import states downloaded via state sync due to the OOM issue when the chain state is large. This limitation affects not only Subcoin but any project using Substrate, including Polkadot itself. For example, the Astar network has already encountered this issue (Issue #5053).
Lack of persistent state sync support:
State sync in Substrate currently lacks the ability to persist partial progress, requiring the entire process to restart in case of interruptions.
These challenges must be addressed to unlock the full potential of Substrate's advanced sync strategies, including warp sync and fast sync, particularly for projects with large chain states.
Additional Notes
In addition to the above issues, I have identified and addressed several other fast sync-related bugs during my work, with some fixes already implemented and others still in progress. These improvements aim to enhance the overall reliability and functionality of Substrate's sync mechanisms.
The goal of this proposal is to enhance Substrate's capabilities by addressing the key state sync obstacles listed in Subcoin Issue #56. Specifically, this work will focus on resolving the OOM issue and implementing persistent state sync, ensuring that Substrate-based projects can reliably utilize advanced synchronization strategies, regardless of chain state size.