sui
sui copied to clipboard
Investigate time spent after restart in sui / narwhal
Steps to Reproduce Issue
When a sui-node has ran for sometime, after restart narwhal in the node does not immediately start catching up on certificates. e.g.
This makes the narwhal catch up after restart much slower than it should. We need to eliminate the inefficiencies here.
Expected Result
After a node restarts, narwhal soon starts catching up.
Actual Result
After a node restarts, it seems to be stuck for >20 min before starting to catch up.
CPU profile after a node restarts before it started to catch up:
Only ~110 samples were taken in 10s, so the profile may not be accurate numerically. But we can look into some of the notable usages and check if they are on the critical path of node restart, and how they block narwhal catch up.
@mwtian I assume this is not relevant anymore, so closing it.