leveldown
leveldown copied to clipboard
Snapshots
Leveldb exposes a feature, called "Snapshots" that comes in handy to get consistent reads on a database. LevelDOWN makes use of snapshots to provide consistent ReadStreams. However, sometimes one needs to combine this with additional read operations - e.g. multiple ReadStreams or multiple get calls. Those won't be consistent because they do not share a common snapshot. Here's a fork that exposes snapshots to node.js. db.snapshot() creates a snapshot which provides the normal read operations known from the LevelDOWN interface: https://github.com/frankboehmer/leveldown It's important to close the snapshot when no longer needed because it may suck up resources.
Since the implementation of a snpashot interface would also need to be discussed with LevelUP, I did not create pull request. Even worse: the fork reintegrates parts of LevelUP.
I think, snapshots would be a nice addition to LevelDOWN / LevelUP. And I feel it's quite easy to achieve. So please feel free to look at the fork or get in touch to join forces to make this happen.
Cheers, Frank
Thanks for reaching out! Previous discussions:
- https://github.com/Level/levelup/issues/15
- https://github.com/Level/levelup/issues/138
TLDR: there hasn't been a compelling use case for exposing snapshots. Can you share yours? :)
Regarding "multiple get calls": we're working to expose iterators and iterator.seek
, which allows multiple reads on one snapshot. Note that leveldown already has iterator.seek
(and it works great), it just isn't exposed in levelup
or implemented in other stores.
Even worse: the fork reintegrates parts of LevelUP.
In what way and why was it necessary?
See also https://github.com/Level/levelup/issues/491#issuecomment-331094818
Exposing iterators in LevelUP would do the trick in some use cases but then you can't have multiple ReadStreams (they cannot share the same iterator concurrently). I'm not sure why there's no strong demand, nobody sees a compelling use case... think e. g. about scanning an index and getting records.
db.createValueStream({start:'index c', end:'index d'}) .on('data', key => { db.get('record ' + key, (err, value) => { ... }) )
In what way and why was it necessary? Just to make the project usable for our use case without putting too much work into changing all the dependent projects like levelup, abstract-leveldown etc. - definitly not the right way to do it in the long term ;-) I'd rather help to make this a core feature of the leveldb ecosystem
Multiple get calls on one single snapshot can be achieved by creating multiple iterators at the exact same time.
@frankboehmer have you tried this approach? I think the performance should be identical.
@peakji Actually, I wasn't aware of this behavior. This would do the trick :). It's a weird interface, though. Having a let snap = db.snapshot() and then snap.get(), snap.createReadStream() etc., snap.close() on LevelUP would seem much more intuitive to me. And the implementation is probably quite simple, at least for the LeveDOWN backend.
Also you'd need to create all required iterators up-front which might be ugly in some use cases.
And the implementation is probably quite simple, at least for the LeveDOWN backend.
at least for the LeveDOWN backend is exactly right 😉
@frankboehmer This is just a workaround 😉. It works because each iterator
has their own internal snapshot
, and since Node.js is single-threaded in the JS land, all the snapshot
s we created synchronously should capture the exact same status.
But after reading the index-retrieve use case above, I'm starting to agree that it's necessary to expose raw snapshot
, otherwise it's pretty hard to achieve the consistency requirement.
However, I don't think this feature should be added in levelup
and making it REQUIRED in the level-ecosystem. snapshot
is a leveldb concept and I'm not sure with the implementation of other backends. AFAIK, we didn't mention snapshot/consistency in levelup.
+1 for implementing in leveldown
first.
otherwise it's pretty hard to achieve the consistency requirement
I'd like to hear why the consistency is required. Often it's perfectly fine to do:
db.createValueStream().on('data', key => {
db.get(key, (err, value) => {
if (err.notFound) {
// Just forget about this key and move on
}
})
)
@vweevers think about the index as a reverse index of a data field that's also stored in the record. If the record gets updated in between, you get inconsistencies - the index doesn't reflect the current state of the record. You might find all sorts of workarounds to handle such a situation - however, it stays an inconsistent view. The most compelling argument for exposing snapshots might be that many users won't actually spend too much thinking on possible inconsistencies when seeing code like the one I posted above. They might be using batch() to get atomic writes - not beeing aware of the fact that they still can end up with inconsistent reads.
@vweevers Suppose we are creating a simple inverted index, and we put the index keys and documents in one single database by using different prefixes:
key | value |
---|---|
word:cat |
[1,2,3] |
word:dog |
[2,3] |
word:spider |
[1] |
word:spiderman |
[3] |
doc:1 |
lorem ipsum dolor cat spider |
doc:2 |
... |
doc:3 |
... |
We can issue a prefix query for docs with words starting with spider*
by scanning keys after word:spider
, but the documents might change while we are iterating. For example, if we update doc:3
to a sentence without spiderman, the word:
iterator still reads the stale snapshot, and will eventually result in irrelevant search hits.
The most compelling argument for exposing snapshots might be that many users won't actually spend too much thinking on possible inconsistencies when seeing code like the one I posted above.
I see your point and want to add the nuance that many people dó think about this and decide they don't care about such inconsistencies. Taking @peakji's example, which is excellent, the end result might be that an application sometimes shows a search result for "spiderman" that isn't relevant anymore. That can be acceptable.
That said, I understand the use case. I'm not sure yet about:
However, I don't think this feature should be added in levelup and making it REQUIRED in the level-ecosystem.
It definitely shouldn't be a required backend feature, but that doesn't mean levelup can't wrap it. It could handle open state, read streams (like it wraps iterators on a regular db) and promises.
@peakji just curious: why did you make this edit to your comment:
~~I think the performance should be identical.~~
why did you make this edit to your comment
Just theoretically, I didn't run the benchmark yet:
In the inverted index example, we can have another iterator and use iterator.seek()
+ iterator.next()
to replace db.get()
. Theoretically this should not make much difference, but in leveldown we are crossing the JS/C++ boundary twice.
@peakji: AFAIK, we didn't mention snapshot/consistency in levelup.
=> https://github.com/Level/levelup/issues/590
Exposing snapshots can make it easier to build a transaction layer. There was some work on this back in 2015 that was never merged. PR #152 https://ongardie.net/blog/node-leveldb-transactions/
Multiple get calls on one single snapshot can be achieved by creating multiple iterators at the exact same time.
@frankboehmer have you tried this approach? ~I think the performance should be identical.~
Is this still true if one of the iterators is done through the stream creation?
const i = db.iterator();
for await (const o of db.createReadStream()) {
// use i here
// can we expect consistency between o and i?
}
@CMCDragonkai Yes, because a stream is backed by an iterator too, which is created when you call createReadStream()
.
Ok that's useful, I was wondering that iterators have the db
property.
Does this mean that iterator.db
is operating from the same snapshot, or is just a reference to the DB that created the iterator?
Just a reference.
I've been attempting to implement snapshot isolation in leveldb, and although iterators provide a snapshot, one problem is the ability to derive further iterators from the same snapshot. Right now this can only be done by creating all the iterators up-front and at the same time. This limits the expressiveness of iterators.
What would it take to bring snapshots directly to the leveldown interface (if not the abstract-level` interface?).
Tracked in https://github.com/Level/community/issues/118. It won't be added to leveldown
, only its successor classic-level
.