streams Adding generic seek or read/write-at-offset abilities to readable/writable streams

A lot of the streams ecosystem so far has focused on network and device streams, where data is definitely sequential and read in order. However, File System Access and the Storage Foundation API (which will hopefully soon become an extension of File System Access) operate on file system streams, which are slightly different. In particular, a common operation on file system streams is random access, i.e. reading/writing to a specified offset.

One could imagine using a separate API for random access, and leaving streams only for the kind of streaming sequential reads/writes that they're already good at. But this feels like a bad outcome. It would result in two similar APIs side-by-side, e.g. read/cancel on stream readers, and read/cancel/seek on random access readers.

Instead we could imagine augmenting stream readers/writers to support this use case. If the underlying source/sink advertises seeking or random access support, then the corresponding reader could expose that capability. Most streams on the web platform today would not support random access. (E.g., seeking a HTTP response doesn't make much sense. Except maybe seeking forward?) But file system streams, and maybe blob.stream(), could support it.

There are a few API details that come to mind:

Is reader.seek(offset) the right API, or should it be something like reader.read(view, { offset }) or reader.readAt(offset, view)? (That's the BYOB case; omit the view for the default reader case.) This seems like a big fork in the road that affects other parts of the API. E.g. if it's a seek-type API, then we need to consider how to queue up the seeks vs. the reads/writes/read requests, or how reads/writes advance the "current position".
Should this be done via adding a seek() or readAt() method to ReadableStreamDefaultReader / ReadableStreamBYOBReader / WritableStreamDefaultWriter, which throws if the underlying sink doesn't support it? Or should we create dedicated "seekable reader/writer" classes or subclasses? The former is a good deal simpler on the spec and implementation side, and is perhaps a better precedent for any future such expansions. But then feature detection would need some kind of canSeek or supportsOffset getter, which is a bit annoying.
What are the "units" for the seeking offset? They could be totally opaque: just a value you pass through to the underlying source/sink. (This starts feeling like some of the generic message-passing mechanisms discussed in #960 and #1026.) Or there could be some minimal validation, e.g. has to be a number (integer?), has to be nonnegative, has to be finite.
Relatedly, should there be a convention for whether seeking past the end throws an exception vs. clamps to the end? I don't know if we can enforce this in the streams infrastructure, but if we could that'd be cool.

May 26 '21 17:05 domenic

Here is the start of a proposal for a seek-based API:

Underlying sources/sinks can supply a promise-returning seek method. If supplied, then the stream's readers/writers support seeking; otherwise they don't.
We add seek() methods and canSeek getters to ReadableStreamDefaultReader / ReadableStreamBYOBReader / WritableStreamDefaultWriter. The seek() method forwards to the underlying source/sink after doing some basic argument validation (nonnegative, finite).
Seeks are queued up (i.e. not yet forwarded to the underlying source/sink) if there are any outstanding read requests or write requests. So e.g. even without awaits, writer.write(c1); writer.seek(10); writer.write(c2) writes c2 at position 10.
If you seek while a readable stream's queue is non-empty, the queue gets emptied; all the buffered-up chunks are lost since they're no longer relevant.

In this model, the underlying source/sink is responsible for knowing what seek means, and how it interacts with reads/writes. The expectation is that they implement things so that reads/writes advance the current position, e.g. writer.seek(10); writer.write(size5Chunk); writer.write(chunk) writes chunk at position 15. But this is not enforced by the streams mechanisms.

Here is the start of a proposal for an offset-based API:

Underlying sources/sinks can set supportsRandomAccess: true.
For such streams, defaultReader.read({ at }), byobReader.read(view, { at }), and defaultWriter.write(chunk, { at }) work. (For streams without that boolean set, supplying any value for at rejects the promise.) They perform basic validation on at.
We add reader.supportsAt and writer.supportsAt booleans.
For writable streams, the underlying sink's write() method gets forwarded the at value from the writer.write() call, which it can use as it sees fit. The existing queuing mechanisms for writes ensure that the stream is never asked to write to two different locations concurrently.
- If no at is supplied, we can either omit it from the call to the underlying sink, or we can auto-compute it based on the size of the chunks. Not sure which is best.
For readable streams, the situation is similar, except with the underlying source's pull() instead of the underlying sink's write(). The automatic calls to pull() which occur based on highWaterMark would take place at an auto-computed or omitted at, and would not be able to fulfill read requests with mismatching ats. The simplest thing to do here might be to empty the queue if a read request comes in with an at mismatching what was expected; otherwise the "queue" starts becoming a non-queue.

On balance the offset-based API seems a bit cleaner.

May 26 '21 18:05 domenic

In my zip file reader, I used the Blob.prototype.slice API to get readAt functionality on files.

May 30 '21 17:05 taralx

What if an http request supports byte range requests? Would be kind of nice to be able to "resume" a broken request

makes a normal request, sees that it accepts range request
now it's able to advertise that it supports ranges so you can abort a request and make a new request whenever you make a new seek() call

or how about http2, would you be able to do some seeking with that?

Mar 18 '22 16:03 jimmywarting

FWIW there is a difference between streaming read, where you probably want aggressive readahead etc., and delimited random access read, where you probably don't and might even want discard-type behavior. So I feel like the solution here might need to be at a higher level, where a File can have a read(offset, length) that returns a ReadableStream.

Mar 18 '22 18:03 taralx

streams streams copied to clipboard

Adding generic seek or read/write-at-offset abilities to readable/writable streams

streams
streams copied to clipboard