streams
streams copied to clipboard
Adding generic seek or read/write-at-offset abilities to readable/writable streams
A lot of the streams ecosystem so far has focused on network and device streams, where data is definitely sequential and read in order. However, File System Access and the Storage Foundation API (which will hopefully soon become an extension of File System Access) operate on file system streams, which are slightly different. In particular, a common operation on file system streams is random access, i.e. reading/writing to a specified offset.
One could imagine using a separate API for random access, and leaving streams only for the kind of streaming sequential reads/writes that they're already good at. But this feels like a bad outcome. It would result in two similar APIs side-by-side, e.g. read/cancel on stream readers, and read/cancel/seek on random access readers.
Instead we could imagine augmenting stream readers/writers to support this use case. If the underlying source/sink advertises seeking or random access support, then the corresponding reader could expose that capability. Most streams on the web platform today would not support random access. (E.g., seeking a HTTP response doesn't make much sense. Except maybe seeking forward?) But file system streams, and maybe blob.stream()
, could support it.
There are a few API details that come to mind:
-
Is
reader.seek(offset)
the right API, or should it be something likereader.read(view, { offset })
orreader.readAt(offset, view)
? (That's the BYOB case; omit theview
for the default reader case.) This seems like a big fork in the road that affects other parts of the API. E.g. if it's a seek-type API, then we need to consider how to queue up the seeks vs. the reads/writes/read requests, or how reads/writes advance the "current position". -
Should this be done via adding a
seek()
orreadAt()
method toReadableStreamDefaultReader
/ReadableStreamBYOBReader
/WritableStreamDefaultWriter
, which throws if the underlying sink doesn't support it? Or should we create dedicated "seekable reader/writer" classes or subclasses? The former is a good deal simpler on the spec and implementation side, and is perhaps a better precedent for any future such expansions. But then feature detection would need some kind ofcanSeek
orsupportsOffset
getter, which is a bit annoying. -
What are the "units" for the seeking
offset
? They could be totally opaque: just a value you pass through to the underlying source/sink. (This starts feeling like some of the generic message-passing mechanisms discussed in #960 and #1026.) Or there could be some minimal validation, e.g. has to be a number (integer?), has to be nonnegative, has to be finite. -
Relatedly, should there be a convention for whether seeking past the end throws an exception vs. clamps to the end? I don't know if we can enforce this in the streams infrastructure, but if we could that'd be cool.
Here is the start of a proposal for a seek-based API:
-
Underlying sources/sinks can supply a promise-returning
seek
method. If supplied, then the stream's readers/writers support seeking; otherwise they don't. -
We add
seek()
methods andcanSeek
getters toReadableStreamDefaultReader
/ReadableStreamBYOBReader
/WritableStreamDefaultWriter
. Theseek()
method forwards to the underlying source/sink after doing some basic argument validation (nonnegative, finite). -
Seeks are queued up (i.e. not yet forwarded to the underlying source/sink) if there are any outstanding read requests or write requests. So e.g. even without
await
s,writer.write(c1); writer.seek(10); writer.write(c2)
writesc2
at position 10. -
If you seek while a readable stream's queue is non-empty, the queue gets emptied; all the buffered-up chunks are lost since they're no longer relevant.
In this model, the underlying source/sink is responsible for knowing what seek means, and how it interacts with reads/writes. The expectation is that they implement things so that reads/writes advance the current position, e.g. writer.seek(10); writer.write(size5Chunk); writer.write(chunk)
writes chunk
at position 15. But this is not enforced by the streams mechanisms.
Here is the start of a proposal for an offset-based API:
-
Underlying sources/sinks can set
supportsRandomAccess: true
. -
For such streams,
defaultReader.read({ at })
,byobReader.read(view, { at })
, anddefaultWriter.write(chunk, { at })
work. (For streams without that boolean set, supplying any value forat
rejects the promise.) They perform basic validation onat
. -
We add
reader.supportsAt
andwriter.supportsAt
booleans. -
For writable streams, the underlying sink's
write()
method gets forwarded theat
value from thewriter.write()
call, which it can use as it sees fit. The existing queuing mechanisms for writes ensure that the stream is never asked to write to two different locations concurrently.- If no
at
is supplied, we can either omit it from the call to the underlying sink, or we can auto-compute it based on the size of the chunks. Not sure which is best.
- If no
-
For readable streams, the situation is similar, except with the underlying source's
pull()
instead of the underlying sink'swrite()
. The automatic calls topull()
which occur based onhighWaterMark
would take place at an auto-computed or omittedat
, and would not be able to fulfill read requests with mismatchingat
s. The simplest thing to do here might be to empty the queue if a read request comes in with anat
mismatching what was expected; otherwise the "queue" starts becoming a non-queue.
On balance the offset-based API seems a bit cleaner.
In my zip file reader, I used the Blob.prototype.slice
API to get readAt
functionality on files.
What if an http request supports byte range requests? Would be kind of nice to be able to "resume" a broken request
- makes a normal request, sees that it accepts range request
- now it's able to advertise that it supports ranges so you can abort a request and make a new request whenever you make a new
seek()
call
or how about http2, would you be able to do some seeking with that?
FWIW there is a difference between streaming read, where you probably want aggressive readahead etc., and delimited random access read, where you probably don't and might even want discard-type behavior. So I feel like the solution here might need to be at a higher level, where a File
can have a read(offset, length)
that returns a ReadableStream
.