streams ReadableStreamBYOBReader.prototype.readFully(view)

trafficstars

As suggested in #1019 (comment), developers may want to read into a BYOB buffer until it is completely filled (instead of only partially filling it with however many bytes the underlying source provides). We already have an example for this with a readInto() helper function, but it may be worth integrating this into the standard itself for better ergonomics and/or optimization opportunities.

Concretely, I suggest we extend ReadableStreamBYOBReader:

interface ReadableStreamBYOBReader {
  Promise<ReadableStreamBYOBReadResult> readFully(ArrayBufferView view);
};

This method performs a BYOB read into view:

If the underlying source responds with n bytes where n < view.byteLength, then the stream will immediately start pulling again with the "remainder" of view (i.e. view.subarray(n), assuming view is a Uint8Array).
If the underlying source responds with n bytes where n === view.byteLength, then the method fulfills with { done: false, value } where value is the completely filled view.
If the underlying source closes before the view is completely filled, then the method fulfills with { done: true, value } where value is the (partially) filled view.
If the underlying source cancels, then the method returns with { done: true, value: undefined }, like with regular BYOB reads.

Some remarks:

We can't return a Promise<ArrayBufferView>, because then you wouldn't know if the stream is closed. Also, this wouldn't work if the stream is cancelled... unless we're okay with returning Promise<undefined> in that case (which I personally don't find very pretty).
We have to be very careful with how we create a pull-into descriptor for the remainder. There may already be other pull-into descriptors into the queue, for example a user could do this:
```
let pullCount = 0;
const rs = new ReadableStream({
  type: 'bytes',
  pull(c) {
    ++pullCount;
    c.byobRequest.view[0] = pullCount;
    c.byobRequest.respond(1);
  }
});
const reader = rs.getReader({ mode: 'byob' });
const read1 = reader.readFully(new Uint8Array(3));
const read2 = reader.read(new Uint8Array(1));
await read1; // should be [1, 2, 3], NOT [1, 3, 4]
await read2; // should be [4], NOT [2]
```
One solution would be to have a "special type" for readFully() pull-into descriptors. Or another (simpler?) solution might be to always put the pull-into descriptor for the remainder at the front of the [[pendingPullIntos]] queue (rather than at the back).
Should we also have a shorthand ReadableStream.readFully(view)? Would it have the same signature, i.e. returning Promise<ReadableStreamBYOBReadResult>?

Jul 12 '21 22:07 MattiasBuelens

Makes sense to me; thanks for thinking through the design.

Should we also have a shorthand ReadableStream.readFully(view)?

IMO we should only do that if we do #1072.

Even then, I'm not 100% sure, because it only makes sense on byte streams, and so quarantining it into the reader seems a bit nicer from a theoretical purity perspective. (This sort of type mismatch affects a lot of the suggestions in #1019...)

Would it have the same signature, i.e. returning Promise<ReadableStreamBYOBReadResult>?

Seems like it should; any reason to deviate?

Jul 12 '21 22:07 domenic

Should we also have a shorthand ReadableStream.readFully(view)?

IMO we should only do that if we do #1072.

Sure! 🙂

Would it have the same signature, i.e. returning Promise<ReadableStreamBYOBReadResult>?

Seems like it should; any reason to deviate?

It might be a bit weird to have a ReadableStreamBYOBReadResult outside of a ReadableStreamBYOBReader. But yeah, it was more of a stylistic question. 😛

Anyway, I'm gonna try some experiments on the reference implementation, see if I can get this thing to work. 👨‍🔬

Jul 12 '21 22:07 MattiasBuelens

https://github.com/whatwg/streams/issues/1175 might be an API that's more easily optimized by implementations while offering the same semantics if needed. The reason is that you may want to allocate 1 large buffer & read as much from the kernel in one shot as you can with subsequent reads to fill up to the minimum amount. In other words, allocate 256kb since that's the typical socket buffer size, read at least 64kb & gracefully handle any extra data between 0 & 192kb without issuing another read. Any "unaligned" overflow can then be handled by moving the overflow back to the beginning of the large buffer & resuming reading into the subview (& accounting for that in your next minBytes value).

If you do the readFully, then unless the underlying kernel reads align perfectly, you'll end up issuing slightly more syscalls than needed. Probably not a material difference of course, but readFully would simply be a convenience wrapper for readAtLeast(buffer.length, buffer) so maybe it makes sense for that to be standardized (if the wrapper is valuable for ergonomic reasons, might still make sense to standardize it).

Oct 13 '21 20:10 vlovich

@MattiasBuelens How hard would it be to just add an optional second argument to fill()?

Oct 13 '21 21:10 ricea

It shouldn't be that difficult. Currently #1145 works by adding a fill boolean flag to each pull-into descriptor to decide if it should fill up to at least element size (for read(view)) or up to byte length (for fill(view)). We can generalize this: replace the flag with a minimum fill length property, and let the user configure it directly.

I do like that we can express everything in terms of fill(view, { minBytes }):

fill(view) is equivalent to fill(view, { minBytes: view.byteLength })
read(view) is equivalent to fill(view, { minBytes: view.constructor.BYTES_PER_ELEMENT })

Not 100% sure about the API though, but we can do some bikeshedding. 😁

read(view, { minBytes }), where minBytes defaults to view's element size to match today's behavior. Then we don't really need a dedicated fill(view), but it could still act as a shorthand for read(view, { minBytes: view.byteLength }) if desired.
fill(view, { minBytes }), where minBytes defaults to view's byte length to match the original fill(view) proposal. But maybe this makes it too easy to forget about minBytes, and you might accidentally end up waiting for the view to be entirely filled? 🤷
readAtLeast(view, minBytes), with no defaults. Less likely to make a mistake, but might be a bit verbose.

Oct 13 '21 22:10 MattiasBuelens

minBytes might not be the best name if you're reading into a multi-byte typed array, since you want to reason about view.length instead of view.byteLength. Maybe atLeast works better?

const { done, value } = reader.read(new Uint16Array(8), { atLeast: 2 });
// value.length will be between 2 and 8
// value.byteLength will be between 4 and 16 bytes

Anyway, it turns out that we can fairly easily generalize the fill(view) implementation from #1145 to support "read at least" semantics. I've already got it working on a separate branch. 😁

So, if we're happy with the approach and decide on an API, I can merge that branch into #1145 and we continue from there (update spec text, add tests,...). 🙂

Oct 14 '21 22:10 MattiasBuelens

Yeah, just realized this today. Needs to be defined in terms of number of elements, not number of bytes (otherwise passing in an odd number would be nonsensical for a multi-byte array).

Nov 01 '21 23:11 vlovich

I'm happy to add support for this in the Node.js implementation also. So +1 on it.

Nov 01 '21 23:11 jasnell

Is this going to be accompanied with a timeout parameter or abort signal? Otherwise, I'm concerned about the possibility of a malicious actor spamming an endpoint with large but extremely low data rate requests (think: a single minimum-size TCP packet once per second) and forcing it to run out of memory. Of course, intermediate proxies can largely mitigate this, but not every client or server can trust the source of their data, and this would be valuable for knocking out that vulnerability a bit easier.

Nov 23 '21 21:11 dead-claudia

Is this going to be accompanied with a timeout parameter or abort signal?

There's a separate discussion ongoing to make reads abortable in #1103. Combined with whatwg/dom#951, you could implement timeouts on reads like this:

const reader = readable.getReader({ mode: 'byob', signal: AbortSignal.timeout(10_000) });
const { done, value } = await reader.read(new Uint8Array(1024), { atLeast: 32 });

If at least 32 bytes arrive within 10 seconds, then value is a Uint8Array containing at least 32 bytes (and up to 1024 bytes).
If fewer than 32 bytes arrive in that time, the read() call rejects and the reader is released. Any received bytes remain in the readable's queue, and can be read later by acquiring a new reader.

(Note that these APIs are not yet finalized, I'm basing this example on the current proposals.)

Nov 23 '21 21:11 MattiasBuelens

We ended up with a slightly different approach for #1103. Instead of adding a new API that accepts an AbortSignal, we changed the behavior of reader.releaseLock(). So the above example would look like this:

const reader = readable.getReader({ mode: 'byob' });

const signal = AbortSignal.timeout(10_000);
signal.addEventListener('abort', () => reader.releaseLock());
// Or alternatively:
setTimeout(() => reader.releaseLock(), 10_000);

// Use reader as normal
const { done, value } = await reader.read(new Uint8Array(1024), { min: 32 });

Jan 14 '22 07:01 MattiasBuelens

streams streams copied to clipboard

ReadableStreamBYOBReader.prototype.readFully(view)

streams
streams copied to clipboard