swarm icon indicating copy to clipboard operation
swarm copied to clipboard

Replace LevelDB with BadgerDB

Open jmozah opened this issue 4 years ago • 9 comments

This PR was created to see if badger is a viable alternative to leveldb and forky. Benchmarks show that it performs at par with current forky implementation if the "SyncWrites" flag is set to false (which is how forky is implemented).

This PR replaces the entire DB, which means both the chunk store and the state store are replaced with badgerDB. The flag "ValueThreshold" is set to 1K, that means any value which is less then 1K will be stored with the key (which is good for state store and indexes). Chunk store values which are 4K will be stored in a separate valueLog just like the forky design.

Right now badger 1.6 version is used as there is a regression in 2.0 version which make read expensive. Once this is fixed we ca start using version 2.0. Version 2.o also has a mem only option which can be used for testing.

jmozah avatar Mar 23 '20 08:03 jmozah

Thanks @jmozah. This PR has nothing to do with the Forky (fcds) it replaces leveldb completely. I thought that the goal would be to replace fcds.Store, at least to measure and compare performance and scalability on multiple cores. Could you provide some performance comparisons between fcds-teenage-mutants (or fcds) branch and this branch?

Have in mind that these changes completely removes everything done in forky, not just fcds.Store file storing approach. New migrations (manual or automatic) for both localstore and all state stores have to be implemented.

janos avatar Mar 23 '20 15:03 janos

@janos There are two commits in this PR. The first one has badger only in fcds. The second one replaces the entire leveldb with badger. I reasoning to go for the second was to avoid having two DB implementations. For that matter i am also not very sure which is best now .. given that we might want to move to bee.

What do you suggest, i just use badger for fcds and benchmark against the forky branch? is that enough?

jmozah avatar Mar 23 '20 20:03 jmozah

What do you suggest, i just use badger for fcds and benchmark against the forky branch? is that enough?

My impression was that fcds benchmarks will be done first to measure the difference between using plain files and badger for chunk data store.

It would be great to have badger for everything, but that is a greater area to cover for validation and justification for such large PR. Beside that, a reason to use an older 1.6 version and not the latest stable 2.0 is a bit worrying.

janos avatar Mar 23 '20 20:03 janos

My impression was that fcds benchmarks will be done first to measure the difference between using plain files and badger for chunk data store.

Ok. WIll create a new PR with only that and will benchmark that.

Beside that, a reason to use an older 1.6 version and not the latest stable 2.0 is a bit worrying.

Why?. They have a bug in the new version. it was reported only few days back and they are fixing it. There is even a patch for that. Unil a minor release comes out i decided to use 1.6 for benchmark.

jmozah avatar Mar 24 '20 06:03 jmozah

Why?. They have a bug in the new version. it was reported only few days back and they are fixing it. There is even a patch for that. Unil a minor release comes out i decided to use 1.6 for benchmark.

Because it would be very nice from you to provide more information, like an issue you just mentioned and what is the expected resolution. This would be very helpful for reviewers.

janos avatar Mar 24 '20 08:03 janos

like an issue you just mentioned and what is the expected resolution.

This is the issues which is affecting version 2.0 in badger. https://github.com/dgraph-io/badger/issues/1248 (fixed) https://github.com/dgraph-io/badger/issues/1254 (not fixed yet)

Thats why i used a older release. BTW 2.0 is supposed to be faster than 1.6. and have lot more features like compression etc.

jmozah avatar Mar 25 '20 07:03 jmozah

Thanks. We should watch for the remaining issue.

janos avatar Mar 25 '20 09:03 janos

Hey @jmozah @janos, I'm the maintainer of Badger DB. The read performance issue has been fixed in master and we will be releasing v2.0.3 soon (it was supposed to be released yesterday). As for the issue https://github.com/dgraph-io/badger/issues/1254 , it is open because the user is seeing some bug in the vlog file generation and we haven't been able to reproduce the failure.

Feel free to raise issues on badger if you find any other performance issues.

Thank you for using BadgerDB :)

jarifibrahim avatar Mar 26 '20 16:03 jarifibrahim

@jarifibrahim Thanks for the information. We are keeping a watch on the bug and will upgrade to 2.0.x as soon as the issue is fixed.

jmozah avatar Mar 27 '20 08:03 jmozah