pogreb icon indicating copy to clipboard operation
pogreb copied to clipboard

Improve crash handling

Open Kleissner opened this issue 5 years ago • 1 comments

I read the code and documentation and wanted to ask if there is a specific reason why you are discarding the old index files and always recreating them? It sounds like a dangerous default and expensive especially re production environments.

In the event of a crash caused by a power loss or an operating system failure, Pogreb discards the index and replays the WAL building a new index from scratch. Segments are iterated from the oldest to the newest and items are inserted into the index.

My use case is to store billions of key-values - and if I read the code correctly, anytime it crashes for any reason, the lock file will be detected and causes Pogreb to discard the index files (*.pix). Current estimated indexing time is 8 days and likely hundreds of GB. Any reboot/crash to cause reindex of hundreds of GB and days of work doesn't make sense? Possible solutions:

  1. New Options.ReindexOnCrash to allow the user to specify whether (on false) it should try to re-open, or (on true) immediately reindex everything; or instead:
  2. Introduce Options.AutoReindexCorruptDatabase which triggers a reindex only in case openIndex returns an error. The lock file will be disregarded for crash detection and it will always try to open the existing database.

I believe the second option makes most sense. In case of crashes most if not all users assume the database will just pick up where it left - especially in production environments.

Kleissner avatar Dec 12 '20 11:12 Kleissner

Same question here!

I've checked a few embedded DBs with 50M records. pogreb is the fastest and needs lowest memory-lowest disk size! But on crash, rebuilding the index takes too much time—during which the app won't be servicing requests. Because of this, I'm using pebble instead.

Here are my benchmark results (about 50,000 records are randomly searched in the benchmark):

Engine QPS P50 (µs) P95 (µs) P99 (µs) Duration Ops Memory (MB) Disk (MB)
pebble 1496880 2.0 5.0 13.0 668ms 1000000 25.1 576.2
badger 703212 4.0 120.0 254.0 1.422s 1000000 312.1 3139.8
sqlite 140673 83.0 300.0 556.0 7.109s 1000000 40.4 4354.7
pogreb 2285152 0.0 1.0 5.0 438ms 1000000 42.4 2887.3

And for the curious, here is the loading performance of 50M records. pogreb is a bit slower but not important (what I need is query performance).

Engine Loading Time Vacuum Time Total Time Records/Sec
Pebble 13:20 1s 13:21 56500
Badger 13:20 1s 13:21 56500
SQLite 20:32 30s 20:33 36500
Pogreb 17:00 2s 17:02 44200

*sqlite driver is cznic/sqlite

derkan avatar Oct 05 '25 18:10 derkan