lmdb-js icon indicating copy to clipboard operation
lmdb-js copied to clipboard

Stream support

Open KyleAMathews opened this issue 3 years ago • 4 comments

Something that'd be nice is to have built-in stream support e.g. if there's a request over HTTP for 100 MB of data.

KyleAMathews avatar May 24 '22 21:05 KyleAMathews

This seems like a potentially helpful idea, and could work well with lmdb-js.

A few caveats though: first, of course lmdb-js doesn't have any knowledge of HTTP or if a get originated from HTTP. And I don't think we would want the normal get operations to change their return type based on entry size (getBinary should always return a Buffer). And at a basic level, creating a stream from data retrieved from lmdb-js is already pretty easy, I think:

import { Readable } from 'stream';

let stream = Readable.from(db.getBinary('huge-data'));

However, that being said, lmdb-js is all about optimal performance, and getBinary is not quite optimal for this, since it does a full copy of the entire entry data to a buffer. Lmdb-js does also have a db.getBinaryFast which supports zero-copy buffers (it uses zero-copy over 32KB), however, I don't believe this would be safe for Readable.from(db.getBinaryFast('huge-data')); because the buffer is not safe to use after the read txn is reset (which happens frequently), whereas a stream would potentially be reading from the stream over a longer period of time. I believe it may be desirable to have a mechanism for getting zero-copy buffers and maintaining their integrity until they are garbage collected. We could have a specific function for streams, and use their ending as a signal for ending a read txn, but I think Readable.from(buffer) is still appropriate here (might need to research that a little more).

Is that what you are thinking, or is there more to streams that you had in mind? Or were you thinking about object streams (from msgpackr)?

kriszyp avatar May 24 '22 22:05 kriszyp

It was a passing thought triggered by a long query blocking my event loop with some test code so I haven't thought about it that deeply.

Object streams would probably be the most commonly used probably as that's directly consumable by the end user as JSON (that's what I would have used earlier). But both binary/object mode seem useful. getBinaryFast w/ read txn would be very sweet for optimal perf.

KyleAMathews avatar May 24 '22 22:05 KyleAMathews

Readable.from would totally have worked as well.

KyleAMathews avatar May 24 '22 22:05 KyleAMathews

For object streams, do you mean streaming a PackrStream of multiple sequential objects and accumulating in buffer to store and then when retrieving the data, streaming from that buffer? Which I think would be roughly like this:

// write
let buffers = [];
let decoded = new PackrStream();
sourceStream.pipe(decoded);
decoded.on('data', function(d) { buffers.push(d); });
decoded.on('end', function() {
  db.put('key', asBinary(Buffer.concat(buffers)));
});

// read
let objectStream = new UnpackrStream();
Readable.from(db.getBinary('key'))).pipe(objectStream);

Anyway, maybe I am missing something, but I think most of this should already be doable, the main thing would again be that it would nice to be able to use zero-copy buffers.

kriszyp avatar May 25 '22 03:05 kriszyp