ipfs-log Reduce memory footprint of a log instance & improve load time for orbit-db stores

Reduce memory footprint of a log instance & improve load time for orbit-db stores

Open mistakia opened this issue 5 years ago • 3 comments

Currently, when a log is loaded, all the data is kept in memory on log._entryIndex. Also, constructing a log requires passing in all the entries, which in the case of OrbitDB Stores that means reading all the data from disk (or just the N latest entries).

This is not very ideal as the load operation is O(n) and the memory footprint is an issue in mobile environments.

What are your thoughts on making the following changes:

allow constructor to require Entries and/or Heads with a nextsIndex (aka a entryHashIndex)

The following methods would also change:

log.values would read from disk when entryIndex does not exist
log.get(hash) would read from disk when entryIndex does not exit
log.has(hash) would use nextsIndex when entryIndex does not exist
log.toJSON / log.toSnapshot would include nextsIndex

The following methods would be added:

log.audit (or similar) to go through and rebuild/validate the nextsIndex

This would allow using just a cache of the head and a nextsIndex to create a functional instance of Log. Also, using the nextsIndex one could navigate and load any portion of the log.

Happy to take this on and explore better ways to accomplish this but would like some feedback to make sure I'm not missing anything obvious.

Jul 13 '18 15:07 mistakia

@mistakia thank you for opening this discussion! This is indeed something we should look at and see how we can improve the current implementation. Great thinking here @mistakia as to how to approach improving it! 👍More than happy to help if you want to work on this, and it is much appreciated! ❤️

I'll try to provide some insight, so couple of things worth to emphasize before starting work on this:

There are trade-offs here that will effect reads and writes. In order to pull in a PR, we should understand what the trade-offs are and how they effect the performance (either as raw throughput or as differences in O-complexity)
Given ^, it would serve the implementor well to write benchmarks for each operation that this touches (.values, get(hash) , has(hash) etc) before changing the code as to have a clear comparison between old vs. new implementation. Further, given the O-complexities involved, I would highly recommend to write the benchmarks based on the existing ones: observing and measuring the throughput over time is hugely important to make sure you don't accidentally add complexity (or can prove that the algorithm is certain complexity).
In the future, we should create a design for introducing the concepts of cold/hot storage in OrbitDB more generally, but that should not hold back on exploring and implementing this! (ie. go ahead and work on this, that's awesome! 👍😄)

I would also recommend to look (and base the implementation on) https://github.com/orbitdb/ipfs-log/blob/local/refs-tests/src/log.js (note the branch) as this branch and code contains upcoming changes to the way the log is loaded, improving performance quite a bit. And I think you'll find the traverse() function helpful for the reads. There's also a little nifty tool to visualize the log loading in that branch, which I've found really useful for comparing between in-memory and on-disk performance as well as for general performance profiling and debugging (through chrome devtools).

Hope this helps to get started!

Also, thank you for all the PRs and work you're doing! I'm slowly going through all of them, bear with me.

Jul 14 '18 06:07 haadcode

nice work on that branch - loading is indeed MUCH faster. 💪💨

Also, I'm now considering whether it might be better to implement these changes inside an OrbitDB store, where a store keeps a "light" oplog in memory containing just the entry hashes (head + nexts). Changes may still be needed to ipfs-log (potentially amending log.append to take nexts?)

What would be the current behavior with OrbitDB stores if you don't call load() when a local log exists and then append entries? Would the log entries be appended properly?

Jul 14 '18 17:07 mistakia

From what I gathered, it appears that the Store will join the logs but the side effect will be the creation of another head.

Jul 14 '18 22:07 mistakia

ipfs-log ipfs-log copied to clipboard

Reduce memory footprint of a log instance & improve load time for orbit-db stores

ipfs-log
ipfs-log copied to clipboard