leveldown icon indicating copy to clipboard operation
leveldown copied to clipboard

Snapshots

Open frankboehmer opened this issue 6 years ago • 22 comments

Leveldb exposes a feature, called "Snapshots" that comes in handy to get consistent reads on a database. LevelDOWN makes use of snapshots to provide consistent ReadStreams. However, sometimes one needs to combine this with additional read operations - e.g. multiple ReadStreams or multiple get calls. Those won't be consistent because they do not share a common snapshot. Here's a fork that exposes snapshots to node.js. db.snapshot() creates a snapshot which provides the normal read operations known from the LevelDOWN interface: https://github.com/frankboehmer/leveldown It's important to close the snapshot when no longer needed because it may suck up resources.

Since the implementation of a snpashot interface would also need to be discussed with LevelUP, I did not create pull request. Even worse: the fork reintegrates parts of LevelUP.

I think, snapshots would be a nice addition to LevelDOWN / LevelUP. And I feel it's quite easy to achieve. So please feel free to look at the fork or get in touch to join forces to make this happen.

Cheers, Frank

frankboehmer avatar Jun 19 '18 09:06 frankboehmer

Thanks for reaching out! Previous discussions:

  • https://github.com/Level/levelup/issues/15
  • https://github.com/Level/levelup/issues/138

TLDR: there hasn't been a compelling use case for exposing snapshots. Can you share yours? :)

Regarding "multiple get calls": we're working to expose iterators and iterator.seek, which allows multiple reads on one snapshot. Note that leveldown already has iterator.seek (and it works great), it just isn't exposed in levelup or implemented in other stores.

Even worse: the fork reintegrates parts of LevelUP.

In what way and why was it necessary?

vweevers avatar Jun 19 '18 09:06 vweevers

See also https://github.com/Level/levelup/issues/491#issuecomment-331094818

vweevers avatar Jun 19 '18 09:06 vweevers

Exposing iterators in LevelUP would do the trick in some use cases but then you can't have multiple ReadStreams (they cannot share the same iterator concurrently). I'm not sure why there's no strong demand, nobody sees a compelling use case... think e. g. about scanning an index and getting records.

db.createValueStream({start:'index c', end:'index d'})
.on('data', key => {
    db.get('record ' + key, (err, value) => {
        ...
    })
)

frankboehmer avatar Jun 19 '18 10:06 frankboehmer

In what way and why was it necessary? Just to make the project usable for our use case without putting too much work into changing all the dependent projects like levelup, abstract-leveldown etc. - definitly not the right way to do it in the long term ;-) I'd rather help to make this a core feature of the leveldb ecosystem

frankboehmer avatar Jun 19 '18 10:06 frankboehmer

Multiple get calls on one single snapshot can be achieved by creating multiple iterators at the exact same time.

@frankboehmer have you tried this approach? I think the performance should be identical.

peakji avatar Jun 19 '18 10:06 peakji

@peakji Actually, I wasn't aware of this behavior. This would do the trick :). It's a weird interface, though. Having a let snap = db.snapshot() and then snap.get(), snap.createReadStream() etc., snap.close() on LevelUP would seem much more intuitive to me. And the implementation is probably quite simple, at least for the LeveDOWN backend.

frankboehmer avatar Jun 19 '18 10:06 frankboehmer

Also you'd need to create all required iterators up-front which might be ugly in some use cases.

frankboehmer avatar Jun 19 '18 10:06 frankboehmer

And the implementation is probably quite simple, at least for the LeveDOWN backend.

at least for the LeveDOWN backend is exactly right 😉

vweevers avatar Jun 19 '18 10:06 vweevers

@frankboehmer This is just a workaround 😉. It works because each iterator has their own internal snapshot, and since Node.js is single-threaded in the JS land, all the snapshots we created synchronously should capture the exact same status.

But after reading the index-retrieve use case above, I'm starting to agree that it's necessary to expose raw snapshot, otherwise it's pretty hard to achieve the consistency requirement.

However, I don't think this feature should be added in levelup and making it REQUIRED in the level-ecosystem. snapshot is a leveldb concept and I'm not sure with the implementation of other backends. AFAIK, we didn't mention snapshot/consistency in levelup.

+1 for implementing in leveldown first.

peakji avatar Jun 19 '18 10:06 peakji

otherwise it's pretty hard to achieve the consistency requirement

I'd like to hear why the consistency is required. Often it's perfectly fine to do:

db.createValueStream().on('data', key => {
    db.get(key, (err, value) => {
        if (err.notFound) {
          // Just forget about this key and move on
        }
    })
)

vweevers avatar Jun 19 '18 11:06 vweevers

@vweevers think about the index as a reverse index of a data field that's also stored in the record. If the record gets updated in between, you get inconsistencies - the index doesn't reflect the current state of the record. You might find all sorts of workarounds to handle such a situation - however, it stays an inconsistent view. The most compelling argument for exposing snapshots might be that many users won't actually spend too much thinking on possible inconsistencies when seeing code like the one I posted above. They might be using batch() to get atomic writes - not beeing aware of the fact that they still can end up with inconsistent reads.

frankboehmer avatar Jun 19 '18 11:06 frankboehmer

@vweevers Suppose we are creating a simple inverted index, and we put the index keys and documents in one single database by using different prefixes:

key value
word:cat [1,2,3]
word:dog [2,3]
word:spider [1]
word:spiderman [3]
doc:1 lorem ipsum dolor cat spider
doc:2 ...
doc:3 ...

We can issue a prefix query for docs with words starting with spider* by scanning keys after word:spider, but the documents might change while we are iterating. For example, if we update doc:3 to a sentence without spiderman, the word: iterator still reads the stale snapshot, and will eventually result in irrelevant search hits.

peakji avatar Jun 19 '18 11:06 peakji

The most compelling argument for exposing snapshots might be that many users won't actually spend too much thinking on possible inconsistencies when seeing code like the one I posted above.

I see your point and want to add the nuance that many people dó think about this and decide they don't care about such inconsistencies. Taking @peakji's example, which is excellent, the end result might be that an application sometimes shows a search result for "spiderman" that isn't relevant anymore. That can be acceptable.

That said, I understand the use case. I'm not sure yet about:

However, I don't think this feature should be added in levelup and making it REQUIRED in the level-ecosystem.

It definitely shouldn't be a required backend feature, but that doesn't mean levelup can't wrap it. It could handle open state, read streams (like it wraps iterators on a regular db) and promises.

vweevers avatar Jun 19 '18 12:06 vweevers

@peakji just curious: why did you make this edit to your comment:

~~I think the performance should be identical.~~

vweevers avatar Jun 19 '18 12:06 vweevers

why did you make this edit to your comment

Just theoretically, I didn't run the benchmark yet:

In the inverted index example, we can have another iterator and use iterator.seek() + iterator.next() to replace db.get(). Theoretically this should not make much difference, but in leveldown we are crossing the JS/C++ boundary twice.

peakji avatar Jun 19 '18 12:06 peakji

@peakji: AFAIK, we didn't mention snapshot/consistency in levelup.

=> https://github.com/Level/levelup/issues/590

vweevers avatar Jun 20 '18 08:06 vweevers

Exposing snapshots can make it easier to build a transaction layer. There was some work on this back in 2015 that was never merged. PR #152 https://ongardie.net/blog/node-leveldb-transactions/

CMCDragonkai avatar Aug 05 '21 12:08 CMCDragonkai

Multiple get calls on one single snapshot can be achieved by creating multiple iterators at the exact same time.

@frankboehmer have you tried this approach? ~I think the performance should be identical.~

Is this still true if one of the iterators is done through the stream creation?

const i = db.iterator();
for await (const o of db.createReadStream()) {
  // use i here
  // can we expect consistency between o and i?
}

CMCDragonkai avatar Mar 14 '22 07:03 CMCDragonkai

@CMCDragonkai Yes, because a stream is backed by an iterator too, which is created when you call createReadStream().

vweevers avatar Mar 14 '22 08:03 vweevers

Ok that's useful, I was wondering that iterators have the db property.

Does this mean that iterator.db is operating from the same snapshot, or is just a reference to the DB that created the iterator?

CMCDragonkai avatar Mar 16 '22 02:03 CMCDragonkai

Just a reference.

vweevers avatar Mar 16 '22 07:03 vweevers

I've been attempting to implement snapshot isolation in leveldb, and although iterators provide a snapshot, one problem is the ability to derive further iterators from the same snapshot. Right now this can only be done by creating all the iterators up-front and at the same time. This limits the expressiveness of iterators.

What would it take to bring snapshots directly to the leveldown interface (if not the abstract-level` interface?).

CMCDragonkai avatar May 03 '22 14:05 CMCDragonkai

Tracked in https://github.com/Level/community/issues/118. It won't be added to leveldown, only its successor classic-level.

vweevers avatar Sep 26 '22 08:09 vweevers