streams icon indicating copy to clipboard operation
streams copied to clipboard

Cloning (not teeing) a readable stream, via controller tricks

Open domenic opened this issue 9 years ago • 6 comments
trafficstars

This is a continuation of the thread at https://github.com/yutakahirano/fetch-with-streams/issues/67#issuecomment-253530274, where @youennf proposes a novel set of tricks that would allow true cloning (not teeing) of a readable stream. First I will explain the trick and talk about why it works. Then we can talk about whether we should support this.


From the outside, it makes no sense to clone a stream. Reading from a stream is destructive---once you read, the chunk is no longer in the stream. So the best you can do is tee it: create two new streams, read from the original, and the enqueue into the two new ones. There's no way to read from the stream, enqueue in the clone, but still somehow leave the chunk in the stream. @youennf's trick gets around this.

The trick depends crucially on the way we have structured streams to be facades around controllers, so that all the interesting behavior, including the data, is stored in the controller. This was originally a design innovation in order to allow both byte and default streams to be served by the same public ReadableStream API: all the interesting behavior takes place in either the ReadableStreamDefaultController or the ReadableByteStreamController.

The innovation is to consider re-targeting a ReadableStream at different controller than the one that was created along with it. This allows the ReadableStream to start exposing a different set of chunks than those that are put into it by its creator, since its creator still manipulates the original controller.

Given a stream toClone, the steps are:

  1. Create a new stream, teeStream.
  2. Move toClone's controller to teeStream. At this point the original controller for teeStream has been thrown away and toClone has no controller.
  3. Let tee1 and tee2 be the result of teeing teeStream. They each have their own controller.
  4. Move the controller of tee1 to toClone. At this point tee1 has no controller but everyone else does. Throw away tee1.
  5. Return tee2: it is a clone.

At this point, code that uses the original controller for toClone will enqueue in the controller for hidden stream teeStream, and thus (via the teeing mechanism) into the controller for tee1 and tee2. Translating that into streams which allow you to read from them, using the original controller for toClone will enqueue into toClone and into tee2.

This requires careful tracking of the original controller. For example, the operations in https://fetch.spec.whatwg.org/#readablestream would not be correct, since they take as an argument stream and then use stream.[[readableStreamController]]. This is not generally a problem for author code, but it does require careful bookkeeping for specs/UA code.


I am torn on whether we should pursue this. On the one hand, it is pretty cool. If you think cloning is a natural thing to do to streams, this accomplishes it neatly.

On the other hand, it is using tricks not accessible to authors, and is hard to explain. Teeing and piping and such all are explicable as operations you could write. They fit with the destructive model of streams and don't use any magic. You could easily write a version of tee() that uses different backpressure semantics or similar. But if someone wanted to create their own version of clone with some customizations, they could not.

What would be helpful to me is figuring out whether developers find cloning a stream to be a natural thing. Are they surprised that it isn't possible? How much are they missing it? How weird do they find request.clone()'s behavior, which resets request.body because of the tee semantics?

We could also consider exposing this operation only to specs, and not as a public .clone() method, just to make request.clone() not reset request.body. In that line of thinking, we'd say that exposing cloning of requests was a mistake since it doesn't fit with the streams model, so we don't want to perpetuate it further in the streams-using ecosystem, but we also want to make sure request.clone() as it exists is maximally reasonable.

I'd love to hear some thoughts. Maybe someone developer-facing like @jakearchibald would be especially helpful.

domenic avatar Nov 04 '16 21:11 domenic

Great write up! Note that MediaStream/MediaStreamTrack have a clone method. Sure the underlying model is different but 'cloning a stream' is already in the air.

Except if there are chances that the stream/controller relationship changes again, I would bite the bullet and allow fetch Request/Response to have a ´clean' cloning.

youennf avatar Nov 13 '16 16:11 youennf

So it sounds like you are probably in favor of exposing this on a spec level only, with no stream.clone(), but just some behind the scenes magic to make request.clone() work? That sounds OK-ish to me.

I'd love to hear some more opinions. @jakearchibald, @wanderview, @annevk, @tyoshino, @ricea, any thoughts?

domenic avatar Nov 16 '16 23:11 domenic

With my implementer hat on, I am naturally opposed as it is more work.

Without my implementer hat on, I am on the fence. I think it definitely shouldn't be exposed in the public API right now. It's easier to add a public API for an abstract operation than it is to remove a public API once you've added it.

A couple of questions I don't know the answers to

  1. How large is the expected performance benefit?
  2. Can it be implemented by browsers without being explicitly specced?

ricea avatar Nov 17 '16 05:11 ricea

Can it be implemented by browsers without being explicitly specced?

I don't think so. You can observe the difference from script. In particular whether toCloneRequest.body stays the same or not. If we want it to remain the same, we have to define how that works (and add this new primitive).

If this is already a thing for media streams on the platform, it seems weird not to have it elsewhere.

I'd be very interested in hearing about the benefits as well. Memory, CPU, speed, battery drain?

annevk avatar Nov 17 '16 08:11 annevk

How large is the expected performance benefit?

I don't think there's any performance benefit. It's only about the semantics benefit. There's still one copy being performed. The procedure here seems slightly more expensive in terms of one-time costs of shuffling things around, but probably not in a noticable way.

If this is already a thing for media streams on the platform, it seems weird not to have it elsewhere.

The thing to remember about media streams is that they are misnamed. They are really more like media pointers, i.e., opaque handles to media data which cannot be introspected or read from, only plugged in to various APIs.

domenic avatar Nov 17 '16 14:11 domenic

I agree with domenic, the benefit is to have fetch clone work as one would expect it to work. In terms of implementation, this should be pretty straightforward. See https://github.com/youennf/streams/commit/f56003da2bf564de22aeba0a1e61108958bd7aa5 as a possible implementation (not tested though).

youennf avatar Nov 24 '16 08:11 youennf