sled
sled copied to clipboard
Doubt about the recovery process
Currently I am using one instance of Db during all execution of my service, but every time the process is interrupted, my db gets corrupted. I understand the motivation, but it's too long the time for recovery, in the next start.
Is there an option to "safe save" the current data and, on the recovery moment, jump "to her"?
I have already tried the flush_every_ms and snapshot_after_ops config options, but without success.
sled: 0.30.3 rust: rustc 1.40.0 (73528e339 2019-12-16) os: raspbian armv7 32bits
My related code:
let db = sled::Config::default()
.path("/var/lib/tap/beerdb")
//.flush_every_ms(Some(100))
//.snapshot_after_ops(1)
.open()
.expect("Erro ao iniciar o banco de dados local");
info!("Banco de dados local: size {:?}, was recovered {}", db.size_on_disk(), db.was_recovered());
let db_rw = Arc::new(RwLock::new(db));
thread::spawn(move || {
loop {
beerdb::sync(db_rw.clone());
thread::sleep(Duration::from_secs(3));
}
});
When I write my data:
let data = serde_json::to_string(&self)
.expect("Error on serialize!")
.to_string();
tree.insert(self.uuid.as_bytes(), data.as_bytes().to_vec())
.expect("Error on store the serialized row!");
tree.flush()
.expect("Error on flush the stored row");
Is the concern that recovery is taking a lot of time? was_recovered()
is always going to return true if an existing database was opened successfully, (so this does not indicate that the database was corrupted) as opening a database always goes through the same recovery code path. There are plans afoot (see, for example, #587) to speed up recovery.
Is the concern that recovery is taking a lot of time?
Yes, like a 2 minutes for a 2.8 MB database.
so this does not indicate that the database was corrupted
Interesting. And is there any way to determine if the base was indeed corrupted?
I changed the configs to
let db = sled::Config::default()
.path("/var/lib/tap/beerdb")
.use_compression(true)
.compression_factor(22)
.open()
.expect("Erro ao iniciar o banco de dados local");
And I got a better result: 40secs +/-(2~3min before)
@fernandobatels hey! if you are actually seeing corruption, this is a bug, but it's not clear that anything is being corrupted. It seems like your real issue is that sled is too slow for recovery on raspberry pi right now?
By the way, you can checksum the database before and after you kill it by calling the checksum method on Db
. It should be the same before and after any interruption.
Hi @spacejam,
It seems like your real issue is that sled is too slow for recovery on raspberry pi right now?
Yes, my real issue, now, is the slow for recovery on raspberry pi.
By the way, you can checksum the database before and after you kill it by calling the checksum method on Db. It should be the same before and after any interruption.
I tested, and the checksum is the same.
Just saw that and was wondering as it reminded me of an issue I saw a while ago in a different system. With compression improving performance could that be a oi bottleneck? Are the wait times for io up during restart?
Hi @fernandobatels, I just cut sled version 0.31 which I think might somewhat improve your recovery times. I would be quite curious to know if your issue is addressed, or if there is still more work that needs to be done before your workload runs nicely on the raspberry pi.
Just note that 0.31 is a binary incompatible change, although sled has a way to migrate from an old version to a new version (while temporarily using twice the storage space): https://docs.rs/sled/latest/sled/struct.Db.html#method.export
Nice @spacejam, I will test this new version in this weekend.
Hi @spacejam, I tested the new version 0.31 and the recovery time is the same:
- +/-40secs with compression
- 2~3min without compression
Size of my database, by size_on_disk(): 5928293
Thanks for the info! There are 2 things I want to improve as patch updates to 0.31, which have to do with garbage collection and using multiple threads to create a snapshot. The better GC process will dramatically reduce the size on disk, which will have a big effect on recovery time because there will be less space to read when you start up. I'll keep you posted, as I think the GC process will actually have a big impact for your specific workload
On Sat, Feb 1, 2020, 14:52 Luis Fernando Batels [email protected] wrote:
Hi @spacejam https://github.com/spacejam, I tested the new version 0.31 and the recovery time is the same:
- +/-40secs with compression
- 2~3min without compression
Size of my database, by size_on_disk(): 5928293
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spacejam/sled/issues/926?email_source=notifications&email_token=AALPRUHO6KIDUJ5ESUEVTWTRAV5ALA5CNFSM4KC5GGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKQ5SVQ#issuecomment-581032278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPRUFHDVKBY7S3JC5DG33RAV5ALANCNFSM4KC5GGGQ .