Issues with networking causing parachains' block time degradation - a possible approach

2yrs ago
5 Comments

Motion 384 and motion 438 scheduled auctions from LP16 to LP26: up until January 2023. In the last weeks, teams have identified and reproduced an issue with networking causing parachain block time degradation, especially since v0.9.16 on Kusama. After analyzing the information from collators on different chains, the team concluded:

  1. Block production overall has dropped by around 35-40%;
  2. 0.9.16 on validators hurt, and on collators even more so;
  3. 0.9.17 on validators helped with collation metrics but not with actual block production;
  4. 0.9.17 on collators hurt collation metrics but seems to have improved block production slightly again;
  5. The amount of collations produced has dropped by about 33%;
  6. The proportion of collations advertised to validators has dropped about 15% and the proportion of
    7.requested collations out of those advertised has dropped another 15%.

All in all, the team believes that there are issues in collators either discovering or connecting to validators and validators having trouble propagating information in a timely manner. The conclusion seems to be a bottleneck in networking on Kusama that will worsen if more chains connect to the network without proper optimizations. The team is fixing the identified issues and this will likely improve things. However, given auctions are scheduled up to January 2023, some extra assessment is required on our side to give as much time as possible for the optimizations to happen.

The below graphs show the issues explained above and experienced by many parachain teams with regards to block times on their chains - we show Statemine numbers below, but other teams can add their examples on this post:

Current experienced block times, this should be flat at 12s.

Transition to the "Network Bridge Subsystem", in normal operations this should be in the order of magnitudes of a few dozen us.

This is a subsystem connected to the "Network Bridge Subsystem", which is congested; the above is one of a few reasons the times that should be in the order of 100s of milliseconds are exploding into the range of seconds (y axis is in seconds).


As a consequence of these delays more messages are considered invalid due to view updates, which essentially causes the block degradation times and the bottleneck experienced in networking.

What To Expect

As the canary network of Polkadot, Kusama is alive to push the limits of what this technology can do: we expect chaos on-chain and it is the goal to engage the community for the canary network to serve the development of Polkadot. After analyzing the current auctions schedule and with current leases expiring for some chains, here is how the parachain count looks over the rest of the year:

We are currently in LP19 and two new parachains will onboard in LP20 (making paras count increase from 27 to 29). With the current plan for existing parachains to renew their slots, the next significant increase will happen at the end of June (making paras count increase from 29 to 35). The team aims to use this time to work on optimizations that will fix the issues seen on-chain and the situation should be reassessed by the end of April.

If, by the start of LP22 auctions in mid-May there are no performance improvements, the team will propose a reduced auction schedule to limit the growth of the number of parachains on Kusama. The amendment of the schedule enacted by motion 438, to be replaced by a slower version of it, which should aim to:

  1. Replace the current schedule and pause the onboarding of new chains,
  2. Allow enough auctions in future LPs for existing ones to renew.

Kusama is fulfilling its purpose of showing current limits before optimization. In order to greatly push those limits we need to keep things stable so the team can focus on getting a major scalability feature called asynchronous backing done. With asynchronous backing, performance will no longer be about liveliness of parachains: this paves the way for resuming pushing limits and more optimization work.

We want to hear your feedback to keep pushing Kusama's capabilities, leave your comments below!

Up
Comments
No comments here