lmdb-js
lmdb-js copied to clipboard
Stream support
Something that'd be nice is to have built-in stream support e.g. if there's a request over HTTP for 100 MB of data.
This seems like a potentially helpful idea, and could work well with lmdb-js.
A few caveats though: first, of course lmdb-js doesn't have any knowledge of HTTP or if a get originated from HTTP. And I don't think we would want the normal get operations to change their return type based on entry size (getBinary should always return a Buffer). And at a basic level, creating a stream from data retrieved from lmdb-js is already pretty easy, I think:
import { Readable } from 'stream';
let stream = Readable.from(db.getBinary('huge-data'));
However, that being said, lmdb-js is all about optimal performance, and getBinary is not quite optimal for this, since it does a full copy of the entire entry data to a buffer. Lmdb-js does also have a db.getBinaryFast which supports zero-copy buffers (it uses zero-copy over 32KB), however, I don't believe this would be safe for Readable.from(db.getBinaryFast('huge-data')); because the buffer is not safe to use after the read txn is reset (which happens frequently), whereas a stream would potentially be reading from the stream over a longer period of time. I believe it may be desirable to have a mechanism for getting zero-copy buffers and maintaining their integrity until they are garbage collected. We could have a specific function for streams, and use their ending as a signal for ending a read txn, but I think Readable.from(buffer) is still appropriate here (might need to research that a little more).
Is that what you are thinking, or is there more to streams that you had in mind? Or were you thinking about object streams (from msgpackr)?
It was a passing thought triggered by a long query blocking my event loop with some test code so I haven't thought about it that deeply.
Object streams would probably be the most commonly used probably as that's directly consumable by the end user as JSON (that's what I would have used earlier). But both binary/object mode seem useful. getBinaryFast w/ read txn would be very sweet for optimal perf.
Readable.from would totally have worked as well.
For object streams, do you mean streaming a PackrStream of multiple sequential objects and accumulating in buffer to store and then when retrieving the data, streaming from that buffer? Which I think would be roughly like this:
// write
let buffers = [];
let decoded = new PackrStream();
sourceStream.pipe(decoded);
decoded.on('data', function(d) { buffers.push(d); });
decoded.on('end', function() {
db.put('key', asBinary(Buffer.concat(buffers)));
});
// read
let objectStream = new UnpackrStream();
Readable.from(db.getBinary('key'))).pipe(objectStream);
Anyway, maybe I am missing something, but I think most of this should already be doable, the main thing would again be that it would nice to be able to use zero-copy buffers.