Why does the `read` method of `ReadableStreamBYOBReader` class, need to _transfer_ the buffer it is provided?
What is the issue with the Streams Standard?
The issue seems to be that the read method of ReadableStreamBYOBReader class, effectively mandates a transferable buffer^1 but for reasons which may be "arbitrary"? Transferable buffers are not a given (anymore?) -- buffers backing WebAssembly.Memory objects, for example, are not transferable, so cannot be used with read.
My user story
I only learned of the particular property of read very recently when I tried to amend an existing stream processing pipeline that relies on a WebAssembly (WASM) routine to process a stream chunk by chunk, this time to eliminate [unnecessary] buffer copying which was a performance bottle-neck. I had used the stream's iterator implicitly with a for await loop that vended ArrayBuffer objects (chunks), which had to be copied into the WASM instance memory (memory) buffer using e.g. new Uint8Array(memory.buffer).set(new Uint8Array(chunk)), which was the unnecessary copying part I now wanted to eliminate. So I thought if I used a bring-your-own-buffer (BYOB) stream reader (reader) instead, I could instead "load" the WASM instance memory buffer directly from the stream. But it wasn't meant to be.
WASM and BYOB don't mesh well
Turns out, because buffers backing WebAssembly.Memory objects, are inherently not transferable (rooted probably in WASM designers wanting to allow for additional potential constraints in a given WASM implementation like guard pages to contain stack overflow errors and what not), a construct like reader.read(new DataView(memory.buffer)) would invariably throw an error as it unsuccessfully attempts to detach the buffer backing the view it is supplied, in due accordance with step 10 in the "Streams Standard" specification.
This makes BYOB unusable with specifically WASM like that, which is IMO a great detriment as pairing two features that largely should help with performance bottlenecks, is currently a "no-go".
Misplaced requirement?
To that end, I'd like to ask -- why mandate in the specification that the buffer be transferred (and thus must be transferable)? What also seems a bit "arbitrary" to me, everything considered, is that there's no clear reason why step 10 is even necessary per the spec. -- if it is omitted and _bufferResult_.[[Value]] is instead the actual buffer backing the view supplied, _view_.[[ViewedArrayBuffer]], then the rest of the steps still appear perfectly valid, and since this is a specification, after all, it would have allowed for implementations to avoid transferring and thus detaching (effectively invalidating) the buffer provided to read. Buffers backing WebAssembly.Memory would be usable with read. There must be something I am missing with my interpretations here?
If TypedArray.set can "do" it, why can't read?
To add to my argument where I am trying to point at the requirement for "transferring" the buffer supplied, being a peculiarity that IMO could have been left out but isn't, we can consider the case of [the specification of] TypedArray.set, for instance, as an example of an API that also does effectively load a [BYOB] buffer, but whose specification does not warrant a "transfer" (there is the parable 23.a using the term "transfer" but there's no indication the word refers to the same concept) and instead simply specifies a "loop over bytes and copy each". And so I've got to ask, in light of this -- why couldn't the specification of read for ReadableStreamBYOBReader have been written in the same way, so that we could "bring our own" WASM memory buffers without issues?
Related
-
https://github.com/wasm-bindgen/wasm-bindgen/discussions/3579, where the accepted answer appears to explain why BYOB works the way it does (transferring the buffer it is given) but does so specifically in context of Rust and
wasmbindgen, and so doesn't necessarily explain BYOB design unless you either have a pair of Rust glasses on, which I don't, or unless you understand why Rust should have a say on how the Strems Standard should be defined -
https://github.com/WebAssembly/design/issues/1397, which, again, attempts to tackle the issue from "the WASM side", asking for rationale behind some of the decisions in WASM, but which may have stemmed in part because there's the relative inability to use WASM with e.g. BYOB; I, for one, don't think there's necessarily anything wrong with the way the "memory" concept was defined in WASM; instead, I ask why transferring of [BYOB] buffers is necessary to define on the level of specification, which effectively makes using [BYOB] with specifically WASM currently impossible (at least using
ReadableStreamBYOBReader) -
https://github.com/WebAssembly/design/issues/1162, which is mostly relevant due to the comment that sheds light on why specifically WASM buffers that back up memory, have to be the way they are
We transfer for safety reasons. Because read(view) is asynchronous and the underlying byte source may be filling the buffer incrementally in some implementation-specific fashion, we don't want to expose details about this process to the calling JavaScript code by having it "peek" at the backing buffer while it's being filled.
We may want to loosen this restriction when dealing with a SharedArrayBuffer. That would allow a WebAssembly.Memory with { shared: true } to work. See #757 for more discussion on this. We haven't made much progress on this front yet, so any extra insights are welcome! 🙂
If
TypedArray.setcan "do" it, why can'tread?
Because TypedArray.set is fully synchronous. There is no way for JavaScript to observe a partially completed copy, because the JavaScript execution only resumes after the set() call. Granted though, this is no longer the case when the TypedArray is backed by a SharedArrayBuffer, in which case a second thread could observe the bytes being written incrementally.
Thank you for the elaboration. But I have a follow-up question: how is being able to peek at the backing buffer (one that the caller brings themselves, i.e. after the "BYOB" fashion), much less one a caller explicitly asks to be loaded, "unsafe", before the promise settles? Even if I peek while, say, half a byte has made it into the buffer, because the implementation for some inexplicable reason does bitwise or'ing, let's say -- how does that make for lack of safety? Can you define safety more concretely, for this particular issue? I mean I have to acknowledge that word, "safety", gets thrown around a lot with IMO vague hints to being "security conscious", but it's such a "catch-all" when used generally and it's hard to take it seriously without at least a concrete example demonstrating the safety argument.
To be constructive, what if the specification rather mandates that before the promise settles, the data in the buffer are not defined (but implied to be safe to peek at)? Or that attempting to use the buffer passed to read before the promise it returns settles, must throw an error? If safety means integrity of sorts, well, it certainly should not "crash" the interpreter nor expose some sensitive data that weren't in the buffer earlier or aren't going to end up in the buffer later. It's not like an implementation is allowed to temporarily "loan" the buffer for unrelated data, is it? The data that is written to the buffer ends up there eventually anyway. Even if it progressively fills up bitwise (4 bits being flipped into place at a time, let's say), I don't see lack of safety? Did you have any concrete examples in mind that demonstrate how this safety would be compromised?
@amn The Wasm/JS specifications explicitly forbid a non-shared array buffer's contents from being mutated concurrently to the execution of the Wasm/JS code. If we didn't detach the ArrayBuffer when passing it to the host, and the host starts writing to the ArrayBuffer from a different thread while Wasm/JS is running, it will see results that violate the languages memory models for non-shared memory.
The reason this is important for 'safety' is that JS JITs and Web API's can make assumption that rely on this memory model. For example a JS JIT can reorder accesses to non-shared memory more aggressively than if it knows another thread may be modifying it. And a Web API (like for example wasm compilation) that takes a non-shared array buffer can read it's contents directly without a copy and not worry that it will be mutated from underneath it.
None of that applies to shared array buffers, which opt-in to the non-determinism. I definitely support loosening this restriction for shared array buffers, and think that would be a great way to reduce copies using this API.
@eqrion, can you link the specification which forbids it? I understand why it would, I just want to chart whether it's WASM or another API specification (or both) that has it specified.
Intuitively, I expect the transferring is mandated because even if we ignore a WASM use case, the underlying implementation of read, had it not taken ownership of the buffer by transferring it for its own use, would have to contend for the buffer access with the caller of read, before the promise settles, which I guess was seen as problematic. And it may well be so -- the caller may try to resize the buffer, for example, before the promise it was returned settles, during the period of time the underlying task is busy reading into the buffer. Is this the kind of thing you generally mean? Because I'd like to illustrate a problematic scenario that has nothing to do with WASM. That would make it clearer which specification is best to address the issue with, and which API to amend, for example.
@amn Which specification that forbids what?
There's not a great single place to link in the wasm spec (which I'm most familiar with) to describe the 'memory model' of unshared memory, it's implied by all the permitted mutations scattered across the spec. But basically, a wasm unshared memory can only be mutated by wasm instructions while a wasm function is running (e.g. single threaded). When a wasm function calls into a host function (like JS or a browser API), an exported memory could be mutated arbitrarily. But only while the external function is running.
I linked to the part of the wasm spec that mandates a wasm memory cannot be detached here.
I think the right place to fix this issue would be to relax it so that SharedArrayBuffer's are allowed (basically #757). We did some initial prototyping of this in Firefox here, but haven't landed the code behind a pref yet.
If you have a use case for this that could be used as a demo to demonstrate performance wins, that would be really helpful. We could get the change landed in Firefox (behind an experimental pref) and you could test it in a live browser.
@amn Which specification that forbids what?
I was quoting your own earlier reply which stated:
The Wasm/JS specifications explicitly forbid a non-shared array buffer's contents from being mutated concurrently to the execution of the Wasm/JS code.
Nevermind, however -- you clarified it all in your latest comment -- I was able to understand how the "WebAssembly.Memory" "detaching" key de-facto forbids detaching/transferring of the buffers held by WASM memory objects.
Seems like supporting JS-WASM "zero-copy" through allowing read to work on SharedArrayBuffer as held by WASM memories (which currently does not work), would be one way to go about it indeed.
I do have a case for it, but I can't demonstrate any performance wins unless I can actually execute the case, which I cannot :) Conceptually, I wrote an MD5 computation routine in WASM (obligatory: yes, I am well aware MD5 is old an broken -- there's plenty of code that still needs it, for non-cryptographically sensitive applications), and it needs to copy every chunk of data fed to the MD5 machine, which is a case of unnecessary copying. Anyway, since I can't run the code I wrote using read which throws trying to detach the WASM memory buffer, I have no means to test expected performance benefits. Using SharedArrayBuffer by initializing the [imported] instance memory with { shared: true } also doesn't work (runtime error), so I am unsure how you want me to present my use case -- as a README/proposal postulating/hypothesising something? I can do that.
I do have a case for it, but I can't demonstrate any performance wins unless I can actually execute the case, which I cannot :)
There are several pure JavaScript implementations of the Streams standard, which you could use to build an experimental implementation:
- The reference implementation in this repository
- A polyfill, such as web-streams-polyfill or @stardazed/streams
- A server runtime, such as Node.js or Deno
That said, this is not a requirement to propose a change in the standard. Have a look at our Participation Guide and Working Mode. 😉
I am unsure how you want me to present my use case -- as a README/proposal postulating/hypothesising something? I can do that.
Yes, that would be a good start. We call this an "explainer", and you can find several explainers for previous features in this repository, such as for readable byte streams or transferable streams.
If you want to champion this feature yourself, then feel free to open a new issue and put your explainer in the issue description. Describe the problem and the proposed solution, and also list any alternative solutions you've considered. From there, we can iterate on the proposal and (if other contributors agree with your proposal) start working on updating the standard.
If you'd rather not champion it yourself, then someone else will have to pick this up.
At the moment developers can expect that when creating a readable BYOB stream, that the underlying buffer is transferable. Meaning that they may choose to call .transfer on it for whatever reason. A change to allow SharedArrayBuffers to be passed up would break this assumption and break existing code that relied upon the mechanic.
How would one address this issue?
Fair point. We will probably need an opt-in on the underlying sink, similar to how you need to specify type: "bytes" to allow BYOB reads in the first place.
const readable = new ReadableStream({
type: "bytes",
shared: true, // name to be decided
pull(controller) {
const { byobRequest } = controller;
// This must always throw
byobRequest.view.buffer.transfer();
}
})
I think this is doable?
- A
SharedArrayBufferis non-transferable, so it will already throw ontransfer(). - A regular
ArrayBufferis transferable, so we have to create a non-transferable version to give to the BYOB request. We could useDetachArrayBufferand pass a custom detach key, so it can only be transferred by theReadableStreamimplementation. This is similar to how a non-sharedWebAssembly.Memoryworks.
Maybe I should just start writing a spec for this... 😅
It would be very useful to see what the spec ramifications would be. My prototype in Firefox is a bit of a hack, but does work well enough in common single-threaded cases for testing.
I don't follow, though, why you would need an opt-in on the ReadableStream itself. Yes, a non-transferable buffer could now end up in the byobRequest. But if the ReadableStream implementation insists on calling .transfer(), then it can just throw/reject, no? No existing code would break, because today you can't pass a SharedArrayBuffer to ReadableStreamBYOBReader.read at all. And besides, it seems unreasonable to expect that every ReadableStream implementation in the wild is future-compatible with all possible options that could ever be added to byobRequest.
But if the ReadableStream implementation insists on calling
.transfer(), then it can just throw/reject, no? No existing code would break, because today you can't pass a SharedArrayBuffer toReadableStreamBYOBReader.readat all.
You have it backwards. Because you can't pass a SharedArrayBuffer to a byob stream currently, code can assume the buffer that's passed in is transferable and call .transfer() on it. It would break existing code that is relying on the fact that the buffer being passed in is in-fact transferable.
You'll have to explain to me a scenario where this is breaking, I guess. Shipping this would not break existing apps, because existing apps would only pass ArrayBuffer to a BYOB read.
It would mean that some (hypothetical) existing ReadableStream implementations are not compatible with SharedArrayBuffer. But that's not breaking, that's just a lack of forward compatibility with new options.
I guess that's true. It would only break new ways people tried using existing stuff
@bvisness It's not the reader's side that's a problem, it's the underlying byte source's side.
Yes, if the underlying byte source tried to call byobRequest.view.buffer.transfer(), then that call will now suddenly throw an error. But is that thrown error always going to be handled correctly by the source and cause the stream to become errored? For example, what if this transfer() call is actually inside the browser's fetch() implementation, and the browser is not checking for errors from that call (because it assumes that it can never throw)? In the best case the browser tab's process crashes, in the worst case it creates a security vulnerability and gets its own CVE. 😛
I do agree that it's a very rare case. Perhaps we should implement it first without an opt-in, and then run an experiment to see if this change is web-compatible?
That would certainly be my preference, since it seems somewhat impractical to actually opt in in many places.
For example, Request.body. That's a getter that returns a ReadableStream. Is that stream shared or not? If it's not, then how do you get a shared stream for use with SAB? If it is, then we do have a backwards compatibility problem since existing code that (hypothetically) does transfer will break. This seems like a nasty and very viral API problem.
Yeah, I think once we allow SABs for readable byte streams, we'll have to make sure all browser APIs that create byte streams (like fetch) must also support SABs. We already have the following note for the current BYOB request view when used from other specifications:
Specifications must not transfer or detach the underlying buffer of the current BYOB request view.
So as long as all other specifications only use our defined operations for creating streams (which they should) and don't violate that rule, I think we should be fine?
Then the only remaining concern is user-land readable byte streams (i.e. new ReadableStream({ type: 'bytes' }) from JS) that don't support SAB. But then you can hopefully fix the underlying byte source implementation in your own JS code.
Would it make sense to create a PR for this SAB update? I'd appreciate being able to follow the actual spec modifications so that I can ensure my Firefox prototype will play nice with it.