sui icon indicating copy to clipboard operation
sui copied to clipboard

events: investigate why the sqlite wal file grew ~170GB in 5 hours

Open bmwill opened this issue 3 years ago • 4 comments

The Fullnode on devent ran out of diskspace when the sqlite wal file for the events db grew ~170GB in roughly 5 hours. We mitigated the issue by deleting the wal file and restarting the fullnode but someone more familiar with the event system and sqlite should investigate and find the root cause this so that it doesn't happen again.

bmwill avatar Nov 03 '22 12:11 bmwill

We found a command to manually clean up the WAL

sqlite> pragma wal_checkpoint(TRUNCATE);
0|0|0

as well as PRAGMAs that can be run either from code or SQLite CLI (we should set these in the code, obviously) that can limit growth of WAL in the future:

sqlite> PRAGMA journal_size_limit = 0;
0
sqlite>
sqlite> PRAGMA wal_autocheckpoint;
1000
sqlite> PRAGMA wal_autocheckpoint = 100;
100

Other directions we could go include to not use a WAL, and to keep tuning the auto checkpoint settings, but it seems the above should be sufficient for now.

velvia avatar Nov 03 '22 19:11 velvia

I think if I run PRAGMA wal_checkpoint(TRUNCATE); at startup, the wal file will be shrunk once the node is dropped, what do you think? With the current code, I have to wait for about 30 minutes.

pragma wal_checkpoint(TRUNCATE);

mattsu2020 avatar Jan 22 '23 15:01 mattsu2020

@mattsu2020 Now there is a forceful wal truncation that runs periodically. Now the wal is staying at a reasonable size. Also note that sqlite will be gone in a couple of months from fullnode.

With the current code, I have to wait for about 30 minutes.

Are you seeing a big wal locally or what is the worry?

longbowlu avatar Jan 23 '23 18:01 longbowlu

In the devnet test, when synchronising at once on a high-speed server, the WAl file became bloated and exceeded 1 TB. This is a very rare case, but we believe the frequency will increase as the block numbers increase in the future.

It would be better if wal_checkpoint(TRUNCATE); could be executed during the initial synchronisation process.

mattsu2020 avatar Jan 23 '23 23:01 mattsu2020

sqlite is no longer in the codebase

longbowlu avatar May 18 '23 07:05 longbowlu