fetch icon indicating copy to clipboard operation
fetch copied to clipboard

blob: URLs and range requests

Open mkruisselbrink opened this issue 5 years ago • 10 comments

The current fetch spec says that all blob: URL requests should result in a 200/OK response (a separate issue, it should probably also mention how to reply if the URL's blob url entry is null). That doesn't entirely match at least chrome's implementation though. In chrome we also support range requests on blob URLs, as long as the range consists of a single range. In that case we reply with a 206 and just the requested range. If multiple ranges are passed in we fail the request with a network error.

I haven't tried what other browsers do for range requests to blob URLs.

mkruisselbrink avatar Aug 06 '20 21:08 mkruisselbrink

fwiw, I'd like to make this work for the cache API too at some point.

jakearchibald avatar Aug 07 '20 09:08 jakearchibald

I haven't tested it but I would hope this is working in Safari as well. WPT tests would be nice.

youennf avatar Aug 07 '20 11:08 youennf

@mkruisselbrink by design the URL's blob URL entry cannot be null. The URL parser does the lookup. I guess perhaps something could still go amiss if it has to transfer process boundaries (OOM) and we should account for that somehow. That should be a network error.

Is this specific to blob or should this also work for data: URLs and about:blank?

annevk avatar Aug 17 '20 08:08 annevk

I'm not sure why URL's blob URL entry wouldn't be able to be null? Step 4 in https://url.spec.whatwg.org/#url-parsing "Set url’s blob URL entry to [...], and null otherwise." There needs to be some way to deal with resolving/fetching blob URLs that don't actually correspond to blobs after all...

I haven't looked into what chrome does for range requests data: URLs or about:blank. Those are at least very different code paths in chrome.

mkruisselbrink avatar Aug 17 '20 16:08 mkruisselbrink

Thanks, I split out null handling into #1077.

annevk avatar Aug 18 '20 14:08 annevk

This is a good issue for someone with some experience writing web-platform-tests. There's a couple steps toward solving this:

  1. Write tests for range requests to blob: URLs (and ideally also data: URLs and about:blank) and use those tests to tease out what browsers are doing.
  2. If browsers agree, great. We probably just want to specify what they do. If they don't agree, let's discuss the results in this issue.
  3. Adjust the requirements in the specification accordingly. (This might be a little involved and as such I can help out here once it gets to this stage.)

annevk avatar May 18 '22 08:05 annevk

This looks like an interesting issue, so I started writing some wpt tests for this. I haven't tried data: or about: urls yet, but I see the following for blob::

Engine Basic Range Multiple Ranges
Gecko 200/ignore range 200/ignore range
Webkit 206/correct range 416
Chrome 206/correct range fetch failed

dlrobertson avatar May 25 '22 04:05 dlrobertson

blob: turned out to be the most interesting of the bunch. Firefox, Safari, and Chrome all returned the same results for the tests I wrote for data: and about:blank. I'll post a PR to wpt for the two schemes that have some agreement, then I'll post a follow-up PR with blob: tests.

dlrobertson avatar May 26 '22 01:05 dlrobertson

Does anyone have any strong opinions on the answer for multiple ranges? I'm somewhat inclined to side with Chromium given that's how we normally guard things that we might support in the future, but WebKit's approach is reasonable too.

For the specification side of things we'll need to:

  1. Split out the guts of https://fetch.spec.whatwg.org/#simple-range-header-value into a simple Range header parser (perhaps "extract a simple range"?). (If we could find a better word than "simple" that would be great.)
    1. Note that this parser needs to take a header list for consistency with "extract a MIME type" and friends. That also makes it clear that if you have multiple Range headers what value ends up being operated on and why that ends up failing (the comma).
    2. We should also test the various edge cases of this parser, including multiple Range headers and such.
  2. Invoke that parser in scheme fetch and use the return values to slice the blob.
    1. Unfortunately https://w3c.github.io/FileAPI/#slice-method-algo doesn't call into an internal algorithm. This is a general problem with the File API specification. We can either use some language ~"act as if you're using the original unmodified version of this method" or fix it upstream.
  3. We'll then read the sliced blob the same way we read the blob now. And we also need to setup the 206 response, presumably including a Content-Range header. (Tests also need to assert all aspects of this.)

annevk avatar May 30 '22 16:05 annevk

According to HTTP-RANGE Section 2.1:

The last-byte-pos value gives the byte-offset of the last byte in the range; that is, the byte positions specified are inclusive.

But according to File API 3.3.1:

slice() method returns a new Blob object with bytes ranging from the optional start parameter up to but not including the optional end parameter

This will not impact parsing, but could impact how we read the sliced blob etc. Currently both chrome and safari use the range like an HTTP Range. I think an inclusive range makes sense, as this would be more intuitive to the end user and we can use rangeStart and rangeEnd + 1 to slice the blob.

dlrobertson avatar Jun 04 '22 00:06 dlrobertson

Engine Basic Range Multiple Ranges
Gecko 200/ignore range 200/ignore range
Webkit 206/correct range 416
Chrome 206/correct range fetch failed

Chrome updated to ignore the range request when multiple ranges are provided.

dlrobertson avatar Oct 06 '22 22:10 dlrobertson

@dlrobertson pointed out in https://github.com/whatwg/fetch/pull/1501#pullrequestreview-1139271294 that whitespace handling is inconsistent between CORS and this. Does that mean implementations use separate parsers for CORS and blob ranges? @rayankans do you have any insights here?

annevk avatar Oct 12 '22 15:10 annevk

Is the question about the chromium implementation? If so it uses the regular CORS parser with extra restrictions, based on the spec changes here. My understanding from the reading of the spec is that white spaces (among other things like suffix ranges) are disallowed. @jakearchibald might be able to provide more insight here.

rayankans avatar Oct 14 '22 10:10 rayankans

@rayankans yeah, in particular whether Chromium uses the same range request parser for CORS and blob URLs.

The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space. @dlrobertson what do you think? We should add a note about the difference with HTTP though.

annevk avatar Oct 14 '22 11:10 annevk

Both Safari and Chromium pass A simple blob range request with whitespace. in https://github.com/web-platform-tests/wpt/pull/34384. This seems to indicate that they both handle whitespace as described in HTTP-RANGE.

The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space.

Yeah, as a start I think it is reasonable to not handle whitespace.

In the long run I'd guess we'll need a "single range header" and "http-range header" parser. Since both implementations of blob range requests do currently handle whitespace (and it is reasonable to do so for a blob range request). Would it be a significant effort to add support for whitespace in the blob range request case but not in the CORS-safelisted request-header check? Could we add a "no whitespace for me please!" input value to the singe range parser? Alternatively are we able to split up the parser a bit? e.g.

Parse single range header:

  • parse bytes=
  • parse a range

Parse HTTP range header:

  • parse bytes=
  • consume whitespace
  • parse a range

This is where my lack of experience in working on these specs becomes very evident 😄 Feedback on what sorts of things are encouraged and improve readability in cases like this are very much welcomed!

dlrobertson avatar Oct 14 '22 14:10 dlrobertson

Both of those approaches are reasonable, but we should also consider being more lenient in the CORS case with regards to whitespace as it seems rather harmless and would allow for more reuse and less branching. Allowing some whitespace with CORS is probably what I'd prefer.

annevk avatar Oct 14 '22 14:10 annevk

@annevk what whitespace are you talking about here? There's no OWS in the spec.

mnot avatar Dec 21 '22 01:12 mnot

@annevk what whitespace are you talking about here? There's no OWS in the spec.

@mnot I thought the same thing originally, but section 2.1 states:

It also uses a list extension, defined in Section 5.6.1, that allows for compact definition of comma-separated lists using a "#" operator...

The relevant text from Section 5.6.1 states:

A construct "#" is defined, similar to "*", for defining comma-delimited lists of elements. The full form is "#element" indicating at least and at most elements, each separated by a single comma (",") and optional whitespace (OWS, defined in Section 5.6.3).

So range-set could have optional whitespace due to the use of the list extension.

Please let me know if I've missed something!

dlrobertson avatar Dec 21 '22 06:12 dlrobertson

the OWS is only allowed around the commas.

mnot avatar Dec 21 '22 06:12 mnot

the OWS is only allowed around the commas.

I do see this in Sender Requirements, which we could enforce since we're only parsing a single range. Note that this is not currently done by WebKit or Safari for blob range request parsing.

dlrobertson avatar Dec 21 '22 06:12 dlrobertson

@mnot I thought allowing tabs and spaces directly after bytes= due to range-set being defined as 1#range-spec, but I guess I got that wrong. (I think I also confused myself with how whitespace was implicit in earlier iterations of HTTP.)

However, implementations reportedly do allow whitespace after bytes=? (Though not with CORS.) Do they allow whitespace in other places?

annevk avatar Dec 21 '22 08:12 annevk

However, implementations reportedly do allow whitespace after bytes=? (Though not with CORS.) Do they allow whitespace in other places?

Yeah, this does appear to be the case with blob range requests. This test passes on WebKit and Blink.

dlrobertson avatar Dec 21 '22 10:12 dlrobertson

Thanks @dlrobertson! I added a comment there with a couple other cases we should add coverage for.

I'm somewhat hopeful this is a thing we can still fix (by disallowing whitespace in these places). @youennf @rayankans thoughts on that?

annevk avatar Dec 21 '22 11:12 annevk

@mnot so I had a look at https://www.rfc-editor.org/rfc/rfc2616 and unless I'm missing something y'all changed the grammar at some point.

implied *LWS The grammar described by this specification is word-based. Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. At least one delimiter (LWS and/or separators) MUST exist between any two tokens (for the definition of "token" below), since they would otherwise be interpreted as a single token.

And then in https://www.rfc-editor.org/rfc/rfc2616#section-14.35.1 it does not note otherwise as far as I can tell.

cc @reschke

annevk avatar Dec 21 '22 13:12 annevk

My vague recollection is that we looked at a few implementations, and they did not handle LWS consistently, thus we left it out here.

reschke avatar Dec 21 '22 15:12 reschke

Look at the example for range-set:

  • The first, middle, and last 1000 bytes:

    bytes= 0-999, 4500-5499, -1000

That's why it has been implemented differently. The ABNF does not allow a SP after = but one of the examples used a space after the bytes= so that's what people implement. That's an errata.

Probably the right fix is

ranges-specifier = range-unit "=" OWS range-set

though we'd probably have to check other servers to see if any fail on OWS there.

royfielding avatar Dec 21 '22 18:12 royfielding

I introduced that extra space in https://github.com/httpwg/http-core/pull/217

royfielding avatar Dec 21 '22 18:12 royfielding

Look at the example for range-set:

* The first, middle, and last 1000 bytes:
  bytes= 0-999, 4500-5499, -1000

@royfielding Sorry, but I don't see this example. Which section is this in?

though we'd probably have to check other servers to see if any fail on OWS there.

Is this something we should check before making a decision on what to do here? I'm happy to give running some tests a shot if so. Any info/tips here on running such tests would be helpful if this is something we need to do.

@annevk In cases like this, what do we tend to do? Next steps feel a bit unclear to me, so any feedback would be very helpful and appreciated. It seems like it would be easier to relax the stance in fetch to allow whitespace in the future if something were to change upstream in HTTP rather than add a restriction in the future. That being said I'm new here 😃

dlrobertson avatar Jan 06 '23 15:01 dlrobertson

The example is at https://httpwg.org/specs/rfc9110.html#rfc.section.14.1.2 (search for "1000 bytes").

HTTP 2616 allowed whitespace:

  • Before =
  • After =
  • Before -
  • After -
  • Before ,
  • After ,

My inclination would be to keep doing that as browsers implemented that correctly and in theory HTTP does not make incompatible changes except if there is a security concern (and even then it's controversial), which I don't think there is in this case.

Given that we only consume this for blob: URLs which is outside the realm of HTTP I think that is okay.

At the same time, keeping it no-whitespace-allowed for CORS makes sense to me as in that case it's in the realm of HTTP so being stricter will result in more interoperability.

Producing is already no-whitespace-allowed: https://fetch.spec.whatwg.org/#concept-request-add-range-header.

annevk avatar Jan 13 '23 10:01 annevk

Thanks, this seems to make sense. So then essentially for blob: URLs we're essentially following the Recipient Requirements (allows whitespace), but for CORS we're following the stricter Sender Requirements. AFAIK the infrastructure for this is implemented in https://github.com/whatwg/fetch/pull/1564, but perhaps a extra note or comment should be added to clarify things.

dlrobertson avatar Jan 17 '23 21:01 dlrobertson