events: investigate why the sqlite wal file grew ~170GB in 5 hours
The Fullnode on devent ran out of diskspace when the sqlite wal file for the events db grew ~170GB in roughly 5 hours. We mitigated the issue by deleting the wal file and restarting the fullnode but someone more familiar with the event system and sqlite should investigate and find the root cause this so that it doesn't happen again.
We found a command to manually clean up the WAL
sqlite> pragma wal_checkpoint(TRUNCATE);
0|0|0
as well as PRAGMAs that can be run either from code or SQLite CLI (we should set these in the code, obviously) that can limit growth of WAL in the future:
sqlite> PRAGMA journal_size_limit = 0;
0
sqlite>
sqlite> PRAGMA wal_autocheckpoint;
1000
sqlite> PRAGMA wal_autocheckpoint = 100;
100
Other directions we could go include to not use a WAL, and to keep tuning the auto checkpoint settings, but it seems the above should be sufficient for now.
I think if I run PRAGMA wal_checkpoint(TRUNCATE); at startup, the wal file will be shrunk once the node is dropped, what do you think? With the current code, I have to wait for about 30 minutes.
pragma wal_checkpoint(TRUNCATE);
@mattsu2020 Now there is a forceful wal truncation that runs periodically. Now the wal is staying at a reasonable size. Also note that sqlite will be gone in a couple of months from fullnode.
With the current code, I have to wait for about 30 minutes.
Are you seeing a big wal locally or what is the worry?
In the devnet test, when synchronising at once on a high-speed server, the WAl file became bloated and exceeded 1 TB. This is a very rare case, but we believe the frequency will increase as the block numbers increase in the future.
It would be better if wal_checkpoint(TRUNCATE); could be executed during the initial synchronisation process.
sqlite is no longer in the codebase