streams
streams copied to clipboard
ReadableStreamBYOBReader.prototype.readFully(view)
As suggested in #1019 (comment), developers may want to read into a BYOB buffer until it is completely filled (instead of only partially filling it with however many bytes the underlying source provides). We already have an example for this with a readInto() helper function, but it may be worth integrating this into the standard itself for better ergonomics and/or optimization opportunities.
Concretely, I suggest we extend ReadableStreamBYOBReader:
interface ReadableStreamBYOBReader {
Promise<ReadableStreamBYOBReadResult> readFully(ArrayBufferView view);
};
This method performs a BYOB read into view:
- If the underlying source responds with
nbytes wheren < view.byteLength, then the stream will immediately start pulling again with the "remainder" ofview(i.e.view.subarray(n), assumingviewis aUint8Array). - If the underlying source responds with
nbytes wheren === view.byteLength, then the method fulfills with{ done: false, value }wherevalueis the completely filled view. - If the underlying source closes before the view is completely filled, then the method fulfills with
{ done: true, value }wherevalueis the (partially) filled view. - If the underlying source cancels, then the method returns with
{ done: true, value: undefined }, like with regular BYOB reads.
Some remarks:
- We can't return a
Promise<ArrayBufferView>, because then you wouldn't know if the stream is closed. Also, this wouldn't work if the stream is cancelled... unless we're okay with returningPromise<undefined>in that case (which I personally don't find very pretty). - We have to be very careful with how we create a pull-into descriptor for the remainder. There may already be other pull-into descriptors into the queue, for example a user could do this:
One solution would be to have a "special type" forlet pullCount = 0; const rs = new ReadableStream({ type: 'bytes', pull(c) { ++pullCount; c.byobRequest.view[0] = pullCount; c.byobRequest.respond(1); } }); const reader = rs.getReader({ mode: 'byob' }); const read1 = reader.readFully(new Uint8Array(3)); const read2 = reader.read(new Uint8Array(1)); await read1; // should be [1, 2, 3], NOT [1, 3, 4] await read2; // should be [4], NOT [2]readFully()pull-into descriptors. Or another (simpler?) solution might be to always put the pull-into descriptor for the remainder at the front of the[[pendingPullIntos]]queue (rather than at the back). - Should we also have a shorthand
ReadableStream.readFully(view)? Would it have the same signature, i.e. returningPromise<ReadableStreamBYOBReadResult>?
Makes sense to me; thanks for thinking through the design.
Should we also have a shorthand
ReadableStream.readFully(view)?
IMO we should only do that if we do #1072.
Even then, I'm not 100% sure, because it only makes sense on byte streams, and so quarantining it into the reader seems a bit nicer from a theoretical purity perspective. (This sort of type mismatch affects a lot of the suggestions in #1019...)
Would it have the same signature, i.e. returning
Promise<ReadableStreamBYOBReadResult>?
Seems like it should; any reason to deviate?
Should we also have a shorthand
ReadableStream.readFully(view)?IMO we should only do that if we do #1072.
Sure! 🙂
Would it have the same signature, i.e. returning
Promise<ReadableStreamBYOBReadResult>?Seems like it should; any reason to deviate?
It might be a bit weird to have a ReadableStreamBYOBReadResult outside of a ReadableStreamBYOBReader. But yeah, it was more of a stylistic question. 😛
Anyway, I'm gonna try some experiments on the reference implementation, see if I can get this thing to work. 👨🔬
https://github.com/whatwg/streams/issues/1175 might be an API that's more easily optimized by implementations while offering the same semantics if needed. The reason is that you may want to allocate 1 large buffer & read as much from the kernel in one shot as you can with subsequent reads to fill up to the minimum amount. In other words, allocate 256kb since that's the typical socket buffer size, read at least 64kb & gracefully handle any extra data between 0 & 192kb without issuing another read. Any "unaligned" overflow can then be handled by moving the overflow back to the beginning of the large buffer & resuming reading into the subview (& accounting for that in your next minBytes value).
If you do the readFully, then unless the underlying kernel reads align perfectly, you'll end up issuing slightly more syscalls than needed. Probably not a material difference of course, but readFully would simply be a convenience wrapper for readAtLeast(buffer.length, buffer) so maybe it makes sense for that to be standardized (if the wrapper is valuable for ergonomic reasons, might still make sense to standardize it).
@MattiasBuelens How hard would it be to just add an optional second argument to fill()?
It shouldn't be that difficult. Currently #1145 works by adding a fill boolean flag to each pull-into descriptor to decide if it should fill up to at least element size (for read(view)) or up to byte length (for fill(view)). We can generalize this: replace the flag with a minimum fill length property, and let the user configure it directly.
I do like that we can express everything in terms of fill(view, { minBytes }):
fill(view)is equivalent tofill(view, { minBytes: view.byteLength })read(view)is equivalent tofill(view, { minBytes: view.constructor.BYTES_PER_ELEMENT })
Not 100% sure about the API though, but we can do some bikeshedding. 😁
read(view, { minBytes }), whereminBytesdefaults to view's element size to match today's behavior. Then we don't really need a dedicatedfill(view), but it could still act as a shorthand forread(view, { minBytes: view.byteLength })if desired.fill(view, { minBytes }), whereminBytesdefaults to view's byte length to match the originalfill(view)proposal. But maybe this makes it too easy to forget aboutminBytes, and you might accidentally end up waiting for the view to be entirely filled? 🤷readAtLeast(view, minBytes), with no defaults. Less likely to make a mistake, but might be a bit verbose.
minBytes might not be the best name if you're reading into a multi-byte typed array, since you want to reason about view.length instead of view.byteLength. Maybe atLeast works better?
const { done, value } = reader.read(new Uint16Array(8), { atLeast: 2 });
// value.length will be between 2 and 8
// value.byteLength will be between 4 and 16 bytes
Anyway, it turns out that we can fairly easily generalize the fill(view) implementation from #1145 to support "read at least" semantics. I've already got it working on a separate branch. 😁
So, if we're happy with the approach and decide on an API, I can merge that branch into #1145 and we continue from there (update spec text, add tests,...). 🙂
Yeah, just realized this today. Needs to be defined in terms of number of elements, not number of bytes (otherwise passing in an odd number would be nonsensical for a multi-byte array).
I'm happy to add support for this in the Node.js implementation also. So +1 on it.
Is this going to be accompanied with a timeout parameter or abort signal? Otherwise, I'm concerned about the possibility of a malicious actor spamming an endpoint with large but extremely low data rate requests (think: a single minimum-size TCP packet once per second) and forcing it to run out of memory. Of course, intermediate proxies can largely mitigate this, but not every client or server can trust the source of their data, and this would be valuable for knocking out that vulnerability a bit easier.
Is this going to be accompanied with a timeout parameter or abort signal?
There's a separate discussion ongoing to make reads abortable in #1103. Combined with whatwg/dom#951, you could implement timeouts on reads like this:
const reader = readable.getReader({ mode: 'byob', signal: AbortSignal.timeout(10_000) });
const { done, value } = await reader.read(new Uint8Array(1024), { atLeast: 32 });
- If at least 32 bytes arrive within 10 seconds, then
valueis aUint8Arraycontaining at least 32 bytes (and up to 1024 bytes). - If fewer than 32 bytes arrive in that time, the
read()call rejects and the reader is released. Any received bytes remain in the readable's queue, and can be read later by acquiring a new reader.
(Note that these APIs are not yet finalized, I'm basing this example on the current proposals.)
We ended up with a slightly different approach for #1103. Instead of adding a new API that accepts an AbortSignal, we changed the behavior of reader.releaseLock(). So the above example would look like this:
const reader = readable.getReader({ mode: 'byob' });
const signal = AbortSignal.timeout(10_000);
signal.addEventListener('abort', () => reader.releaseLock());
// Or alternatively:
setTimeout(() => reader.releaseLock(), 10_000);
// Use reader as normal
const { done, value } = await reader.read(new Uint8Array(1024), { min: 32 });