streams
streams copied to clipboard
Returning buffers to the underlying source
An alternative or perhaps a companion to BYOB readers is the concept of a "two-phase" read where the UnderlyingSource enqueues a chunk but can get a later signal that the buffer holding that chunk is no longer in use and so can be reused for another chunk.
This could be useful, for example, when the data source being wrapped by the UnderlyingSource provides a BYOB-style API and the buffers passed as chunks to enqueue() could eventually be reused once they have been returned to the source.
It could also be useful for sources implemented by the platform which read data from shared memory, as an ArrayBuffer could be constructed over this shared memory and enqueued instead of requiring a copy of the data to be made (BYOB only saves the buffer allocation, not the copy). In this case, since shared memory is a limited resource, failure to return the chunks to the stream could result in backpressure. In the most extreme case the source may only allow a single buffer to be outstanding before later reads could complete.
An alternative or perhaps a companion to BYOB readers is the concept of a "two-phase" read where the
UnderlyingSourceenqueues a chunk but can get a later signal that the buffer holding that chunk is no longer in use and so can be reused for another chunk.
It sounds like you want an API where the underlying source is in control of how its buffer is re-used for repeated enqueue() calls, rather than relying on the reader to re-use the same buffer for repeated read(view) calls?
Indeed, you'd need a different reader API for that. Right now, read(view) returns a view, and the consumer can do whatever it wants with that view for however long it wants. Instead, we'd need an API to restrict how long the consumer can use the view. Some ideas:
- Add a
reader.reclaim(buffer)method. This might be quite error-prone though, or even confusing if you're doing multiple parallel reads across multiple buffers. - Add an async callback argment to
reader.read(), which is called with the{ done, value }tuple. That value will only remain valid for the duration of the async callback, so the stream can automatically reclaim its buffer when the callback completes. (This is inspired by Web Locksrequest().)
Another pain point is that the consumer might want to re-transfer the buffer a couple of times before returning ownership of the buffer to the stream. So it's not good enough to look at the original value of value.buffer, we'd need a way to reclaim ownership of the final (possibly re-transferred) buffer.
This could be useful, for example, when the data source being wrapped by the
UnderlyingSourceprovides a BYOB-style API and the buffers passed as chunks toenqueue()could eventually be reused once they have been returned to the source.
Not entirely sure why you'd need this, since this already works today:
let stream = new ReadableStream({ type: 'bytes', /* ... */ });
let reader = stream.getReader({ mode: 'byob' });
let wrappedStream = new ReadableStream({
type: 'bytes',
autoAllocateChunkSize: 1024, // ensure byobRequest exists, even when using a default reader
async pull(controller) {
const { done, value } = await reader.read(controller.byobRequest.view);
if (done) {
controller.close();
}
controller.byobRequest.respondWithNewView(value);
},
async cancel(reason) {
await reader.cancel(reason);
}
});
But yes, that requires the consumer to re-use the buffer. The underlying source has no say in this.
It could also be useful for sources implemented by the platform which read data from shared memory
Correct me if I'm wrong, but I think this requires SharedArrayBuffers? I don't think it's safe to construct a regular ArrayBuffer on top of a shared memory region.
That makes things quite difficult. We rely on transferring ArrayBuffers to pass ownership around, but SharedArrayBuffers are not transferable. We'd have to revisit a lot of our assumptions if we cannot transfer buffers within the stream. Also, I'm not sure how a reader API would safely use chunks backed by shared array buffers. 😕
The concrete motivation for this is based on some internal details about how a couple of APIs in Chromium that provide an UnderlyingSource are implemented. For example, when an HTTP response body is received the browser process writes it into the producer side of the circular buffer and the renderer reads it out of the consumer side to provide chunks to the ReadableStream. This requires both an allocation to create the new ArrayBuffer and a copy to populate data into that buffer. BYOB avoids the buffer allocation but doesn't avoid the copy. My proposal here is that if an ArrayBuffer could be constructed directly over this shared memory then we could avoid both. The actual performance benefit of doing this is up for debate but it's an interesting API design question so let's just assume for the moment that it might be worthwhile. 😄
The problem you run into if you try to do this is that the size of that circular buffer is fixed and so unless there's a way for the browser to know when script is done with the buffers it has been given eventually the whole thing will be marked as "in use" and no more progress can be made. The two options I see here are either,
- Solve the problem silently by noticing that we're running out of space, checking to see if script is still holding references to older buffers and forcing copies to be made so that the shared memory is "unlocked".
- Give script a way to explicitly release a buffer back to the
UnderlyingSourcewhen it is done with it.
An ArrayBuffer built on top of browser-owned shared memory like this isn't really a SharedArrayBuffer because it isn't being modified by two different threads. Safety comes from the fact that the shared memory can't be reused until the ArrayBuffer is released, but that means there needs to be a way for that release to be signaled.
Yeah, I encouraged @reillyeon to file this so we could have an interesting discussion; to be clear, not urgent :).
My first thought when he described this was
const reader = rs.getReader("managed-view");
const { done: done1, value: value1, release: release1 } = await reader.read();
// ... later ...
release1();
const { done: done2, value: value2, release: release2 } = await reader.read();
with the semantics that value1 and value2 are distinct Uint8Arrays, over distinct ArrayBuffer objects, but with the same backing "shared" memory.
If we are talking about API design (which means the consumer needs to explicitly release the buffer), I think it's already possible like this:
declare function pullFromSource<T>(): Promise<T>;
declare function releaseToSource<T>(value: T): void;
interface ReusableData<T> {
value: T;
release: () => void;
}
class ReusableReadableStream<T> extends ReadableStream<ReusableData<T>>{
constructor() {
super({
async pull(controller) {
const value = await pullFromSource<T>();
controller.enqueue({
value, release: () => {
releaseToSource(value);
}
});
}
})
}
}
const stream = new ReusableReadableStream<Uint8Array>();
stream.pipeTo(new WritableStream({
write(chunk) {
// do things to chunk
console.log(chunk.value);
chunk.release();
}
}))
The consumer needs to use chunk.value, so it knows it's special, and remembers to call release().
If we don't want the consumer to learn new things, is it possible to let GC release the buffer?
@yume-chan makes a great point. This pattern is already possible without making any changes to the streams API by passing a type that explicitly supports this behavior through the stream.
for await (const chunk of readable) {
// use chunk
chunk.release();
}
I can see a number of complexities in the other proposed methods that make them undesirable. For instance, given @domenic's example:
const { done: done1, value: value1, release: release1 } = await reader.read();
// ... later ...
release1();
If value is a transferable value (like an ArrayBuffer or TypedArray, then the consuming code can simply transfer that away to some other place. What would the semantics of release() be in that case? There's also no guarantee that the consuming code will process the results of each read in any predictable order, which further complicates things.
All of the edge cases here are complex enough that, for this behavior, it would be better for users to devise their own type with acquire-use-then-release semantics.
tl;dr is I don't think this idea is something we should bake into the spec.