streams icon indicating copy to clipboard operation
streams copied to clipboard

Add WritableStreamDefaultController.releaseBackpressure()

Open MattiasBuelens opened this issue 3 years ago • 9 comments
trafficstars

This adds a new releaseBackpressure() method (name to be decided) to WritableStreamDefaultController, allowing a WritableStream with HWM = 0 to resolve the pending writer.ready promise manually.

This also integrates with TransformStreams: when pull() is called on the readable side, the transform stream calls releaseBackpressure() on its writable side. This allows piping through TransformStreams with HWM = 0 on both their readable and writable side, enabling "unbuffered" transformations:

const rs = new ReadableStream({
  start(c) {
    c.enqueue("a");
    c.enqueue("b");
    c.enqueue("c");
    c.close();
  }
});

const upperCaseTransform = new TransformStream({
  transform(chunk, controller) {
    controller.enqueue(chunk.toUpperCase());
  }
}, { highWaterMark: 0 }, { highWaterMark: 0 });

const ws = new WritableStream({
  write(chunk) {
    console.log(chunk);
  }
});

await rs.pipeThrough(upperCaseTransform).pipeTo(ws);

Previously, the above snippet would stall.

Fixes #1158.

To do:

  • [x] Write spec text
  • [ ] Write more tests
  • [ ] Bikeshed on the method name

  • [ ] At least two implementers are interested (and none opposed):
  • [x] Tests are written and can be reviewed and commented upon at:
    • https://github.com/web-platform-tests/wpt/pull/31742
  • [ ] Implementation bugs are filed:
    • Chrome: …
    • Firefox: …
    • Safari: …

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff


Preview | Diff

MattiasBuelens avatar Nov 25 '21 00:11 MattiasBuelens

Looks reasonable so far. Should we make unbuffered transforms the default? I suspect it's more desirable, and hopefully wouldn't be too breaking...

domenic avatar Nov 29 '21 20:11 domenic

Should we make unbuffered transforms the default? I suspect it's more desirable, and hopefully wouldn't be too breaking...

I'd like that a lot.

  • Identity transform streams would no unnecessarily longer increase the total buffer size of a pipe chain.
  • In a pipe chain, we would only start pulling from the source as soon as we've connected our chain to either a buffered transform stream or to the final sink. (Right now, we start pulling from the original source stream as soon as we pipe it to the first transform stream, which might not yet be connected to a sink.)

I tried out the change locally, and it looks like we only need to change a couple of tests around writer.desiredSize. So I think we can get away with it? 🤞

MattiasBuelens avatar Nov 29 '21 22:11 MattiasBuelens

Looks reasonable so far. Should we make unbuffered transforms the default? I suspect it's more desirable, and hopefully wouldn't be too breaking...

I'm concerned that people may be relying on the existing behaviour. With buffering, all the transforms in a pipe can run in parallel, but without it they will run in series. This could cause mysterious performance regressions.

ricea avatar Nov 30 '21 05:11 ricea

Hmm, good point. We don't know if or when transform() will call enqueue(). It might do it synchronously (e.g. for an identity transform or for TextDecoderStream), in which case buffering doesn't help and writable HWM = 0 would make more sense. But we cannot assume that. 🤔

If we don't change the default, we should document how authors can construct an unbuffered transform stream, and when they should use this. We should also go through other specifications that use TransformStream and check if they should have writable HWM 0 or 1.

MattiasBuelens avatar Nov 30 '21 10:11 MattiasBuelens

I've updated set up a TransformStream to take optional arguments for the writable/readable HWMs and size algorithms, such that we can use it in create an identity TransformStream.

As I said before, we'll want to update other specifications to also set the writable HWM to 0 for synchronous unbuffered transform streams, such as CompressionStream and TextDecoderSteam.

MattiasBuelens avatar Nov 30 '21 22:11 MattiasBuelens

I haven't had a chance to look at the implementation in detail. I think Chrome is interested in this functionality in general. I will leave it to @domenic to make the call on when we are happy with the spec change.

In the spirit of bikeshedding, I generally like the name releaseBackpressure() although I'd make it one word if I could. One problem I see with the name is that people might interpret it to mean that it will release backpressure even if desiredSize is negative.

Alternative names I've thought of

  • acceptChunk() - has the same problem
  • ready() - ambiguous
  • markReady() - also might be misinterpreted
  • pull() - confusing?

ricea avatar Dec 01 '21 15:12 ricea

So basically if you're doing a sync transform, you should use HWM = 0, but an async transform should use HWM = 1 (or higher)? That's rough for web developers. I guess the conservative default is to stick with 1 and tell people to customize for sync transforms...

Maybe we could make it better in the future with something like TransformStream.fromSyncMap(x => y), but I dunno, then you might want to have flush logic...

Alternative names I've thought of

I like markReady() since the ready promise is the existing developer-facing manifestation of backpressure. I agree it's hard to communicate the conditional nature of the API, without a long name.

domenic avatar Dec 01 '21 21:12 domenic

I'm concerned that people may be relying on the existing behaviour. With buffering, all the transforms in a pipe can run in parallel,

@ricea I assume you mean "all the transforms in a pipe can run in parallel with each other" here. I.e. they all have data available to work on at the same time.

but without it they will run in series.

And here your concern is all but one will sit idle waiting for data upstream, as observed from any one point in time?

I don't think that'd be the case. Assuming sufficient input data and transforms taking approximately the same time, all async transforms should still run in parallel with each other, even with HWM = 0. The only difference being that, from the perspective of each transform, the transform upstream of it is working on a newer chunk, at the same time.

  • Snapshot at time X: camera → transformA(frame4) → transformB(frame3) → transformC(frame2) → sink(frame1)

These seem like reasonable assumptions for a healthy real-time stream.

But what seems true (and maybe this is what you worry about) is that this ideal never happens, and even tiny transform delays will accumulate (delayABC), which could blow a realtime buffer. Sure.

For example e.g. transformB spikes, taking 50% longer than usual on frame3, but then immediately makes up for it by taking half the normal time on frame4. Here HWM = 1 might improve resilience to variance. But to smooth out any longer delays than that will take a deeper buffer anyway, so I don't think the difference of 0 to 1 is foundational, unless I'm missing something.

This could cause mysterious performance regressions.

jan-ivar avatar Mar 25 '22 16:03 jan-ivar

The above is my picture (which could be off) of how the pipe is filled in response to requests for new data that propagate up the pipe chain based on transforms resolving their pending writer.ready promise, which

  • for HWM=0: would be from markReady() in response to a downstream pull, and
  • for HWM=1: would be from desiredSize dipping above zero, effectively also in response to a downstream pull

I suppose one difference is that during this propagation (is it async?), no work is happening in parallel in the former case for a brief moment, while in the latter case we know each transform has at least one item to chew on? In either case downstream is driving this.

jan-ivar avatar Mar 25 '22 17:03 jan-ivar