fetch
fetch copied to clipboard
blob: URLs and range requests
The current fetch spec says that all blob: URL requests should result in a 200/OK response (a separate issue, it should probably also mention how to reply if the URL's blob url entry is null). That doesn't entirely match at least chrome's implementation though. In chrome we also support range requests on blob URLs, as long as the range consists of a single range. In that case we reply with a 206 and just the requested range. If multiple ranges are passed in we fail the request with a network error.
I haven't tried what other browsers do for range requests to blob URLs.
fwiw, I'd like to make this work for the cache API too at some point.
I haven't tested it but I would hope this is working in Safari as well. WPT tests would be nice.
@mkruisselbrink by design the URL's blob URL entry cannot be null. The URL parser does the lookup. I guess perhaps something could still go amiss if it has to transfer process boundaries (OOM) and we should account for that somehow. That should be a network error.
Is this specific to blob or should this also work for data: URLs and about:blank?
I'm not sure why URL's blob URL entry wouldn't be able to be null? Step 4 in https://url.spec.whatwg.org/#url-parsing "Set url’s blob URL entry to [...], and null otherwise." There needs to be some way to deal with resolving/fetching blob URLs that don't actually correspond to blobs after all...
I haven't looked into what chrome does for range requests data: URLs or about:blank. Those are at least very different code paths in chrome.
Thanks, I split out null handling into #1077.
This is a good issue for someone with some experience writing web-platform-tests. There's a couple steps toward solving this:
- Write tests for range requests to
blob:URLs (and ideally alsodata:URLs andabout:blank) and use those tests to tease out what browsers are doing. - If browsers agree, great. We probably just want to specify what they do. If they don't agree, let's discuss the results in this issue.
- Adjust the requirements in the specification accordingly. (This might be a little involved and as such I can help out here once it gets to this stage.)
This looks like an interesting issue, so I started writing some wpt tests for this. I haven't tried data: or about: urls yet, but I see the following for blob::
| Engine | Basic Range | Multiple Ranges |
|---|---|---|
| Gecko | 200/ignore range | 200/ignore range |
| Webkit | 206/correct range | 416 |
| Chrome | 206/correct range | fetch failed |
blob: turned out to be the most interesting of the bunch. Firefox, Safari, and Chrome all returned the same results for the tests I wrote for data: and about:blank. I'll post a PR to wpt for the two schemes that have some agreement, then I'll post a follow-up PR with blob: tests.
Does anyone have any strong opinions on the answer for multiple ranges? I'm somewhat inclined to side with Chromium given that's how we normally guard things that we might support in the future, but WebKit's approach is reasonable too.
For the specification side of things we'll need to:
- Split out the guts of https://fetch.spec.whatwg.org/#simple-range-header-value into a simple Range header parser (perhaps "extract a simple range"?). (If we could find a better word than "simple" that would be great.)
- Note that this parser needs to take a header list for consistency with "extract a MIME type" and friends. That also makes it clear that if you have multiple
Rangeheaders what value ends up being operated on and why that ends up failing (the comma). - We should also test the various edge cases of this parser, including multiple
Rangeheaders and such.
- Note that this parser needs to take a header list for consistency with "extract a MIME type" and friends. That also makes it clear that if you have multiple
- Invoke that parser in scheme fetch and use the return values to slice the blob.
- Unfortunately https://w3c.github.io/FileAPI/#slice-method-algo doesn't call into an internal algorithm. This is a general problem with the File API specification. We can either use some language ~"act as if you're using the original unmodified version of this method" or fix it upstream.
- We'll then read the sliced blob the same way we read the blob now. And we also need to setup the 206 response, presumably including a
Content-Rangeheader. (Tests also need to assert all aspects of this.)
According to HTTP-RANGE Section 2.1:
The last-byte-pos value gives the byte-offset of the last byte in the range; that is, the byte positions specified are inclusive.
But according to File API 3.3.1:
slice()method returns a new Blob object with bytes ranging from the optional start parameter up to but not including the optional end parameter
This will not impact parsing, but could impact how we read the sliced blob etc. Currently both chrome and safari use the range like an HTTP Range. I think an inclusive range makes sense, as this would be more intuitive to the end user and we can use rangeStart and rangeEnd + 1 to slice the blob.
Engine Basic Range Multiple Ranges Gecko 200/ignore range 200/ignore range Webkit 206/correct range 416 Chrome 206/correct range fetch failed
Chrome updated to ignore the range request when multiple ranges are provided.
@dlrobertson pointed out in https://github.com/whatwg/fetch/pull/1501#pullrequestreview-1139271294 that whitespace handling is inconsistent between CORS and this. Does that mean implementations use separate parsers for CORS and blob ranges? @rayankans do you have any insights here?
Is the question about the chromium implementation? If so it uses the regular CORS parser with extra restrictions, based on the spec changes here. My understanding from the reading of the spec is that white spaces (among other things like suffix ranges) are disallowed. @jakearchibald might be able to provide more insight here.
@rayankans yeah, in particular whether Chromium uses the same range request parser for CORS and blob URLs.
The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space. @dlrobertson what do you think? We should add a note about the difference with HTTP though.
Both Safari and Chromium pass A simple blob range request with whitespace. in https://github.com/web-platform-tests/wpt/pull/34384. This seems to indicate that they both handle whitespace as described in HTTP-RANGE.
The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space.
Yeah, as a start I think it is reasonable to not handle whitespace.
In the long run I'd guess we'll need a "single range header" and "http-range header" parser. Since both implementations of blob range requests do currently handle whitespace (and it is reasonable to do so for a blob range request). Would it be a significant effort to add support for whitespace in the blob range request case but not in the CORS-safelisted request-header check? Could we add a "no whitespace for me please!" input value to the singe range parser? Alternatively are we able to split up the parser a bit? e.g.
Parse single range header:
- parse
bytes= - parse a range
Parse HTTP range header:
- parse
bytes= - consume whitespace
- parse a range
This is where my lack of experience in working on these specs becomes very evident 😄 Feedback on what sorts of things are encouraged and improve readability in cases like this are very much welcomed!
Both of those approaches are reasonable, but we should also consider being more lenient in the CORS case with regards to whitespace as it seems rather harmless and would allow for more reuse and less branching. Allowing some whitespace with CORS is probably what I'd prefer.
@annevk what whitespace are you talking about here? There's no OWS in the spec.
@annevk what whitespace are you talking about here? There's no OWS in the spec.
@mnot I thought the same thing originally, but section 2.1 states:
It also uses a list extension, defined in Section 5.6.1, that allows for compact definition of comma-separated lists using a "#" operator...
The relevant text from Section 5.6.1 states:
A construct "#" is defined, similar to "*", for defining comma-delimited lists of elements. The full form is "
# element" indicating at least and at most elements, each separated by a single comma (",") and optional whitespace (OWS, defined in Section 5.6.3).
So range-set could have optional whitespace due to the use of the list extension.
Please let me know if I've missed something!
the OWS is only allowed around the commas.
the OWS is only allowed around the commas.
I do see this in Sender Requirements, which we could enforce since we're only parsing a single range. Note that this is not currently done by WebKit or Safari for blob range request parsing.
@mnot I thought allowing tabs and spaces directly after bytes= due to range-set being defined as 1#range-spec, but I guess I got that wrong. (I think I also confused myself with how whitespace was implicit in earlier iterations of HTTP.)
However, implementations reportedly do allow whitespace after bytes=? (Though not with CORS.) Do they allow whitespace in other places?
However, implementations reportedly do allow whitespace after
bytes=? (Though not with CORS.) Do they allow whitespace in other places?
Yeah, this does appear to be the case with blob range requests. This test passes on WebKit and Blink.
Thanks @dlrobertson! I added a comment there with a couple other cases we should add coverage for.
I'm somewhat hopeful this is a thing we can still fix (by disallowing whitespace in these places). @youennf @rayankans thoughts on that?
@mnot so I had a look at https://www.rfc-editor.org/rfc/rfc2616 and unless I'm missing something y'all changed the grammar at some point.
implied *LWS The grammar described by this specification is word-based. Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. At least one delimiter (LWS and/or separators) MUST exist between any two tokens (for the definition of "token" below), since they would otherwise be interpreted as a single token.
And then in https://www.rfc-editor.org/rfc/rfc2616#section-14.35.1 it does not note otherwise as far as I can tell.
cc @reschke
My vague recollection is that we looked at a few implementations, and they did not handle LWS consistently, thus we left it out here.
Look at the example for range-set:
-
The first, middle, and last 1000 bytes:
bytes= 0-999, 4500-5499, -1000
That's why it has been implemented differently. The ABNF does not allow a SP after = but one of the examples used a space after the bytes= so that's what people implement. That's an errata.
Probably the right fix is
ranges-specifier = range-unit "=" OWS range-set
though we'd probably have to check other servers to see if any fail on OWS there.
I introduced that extra space in https://github.com/httpwg/http-core/pull/217
Look at the example for range-set:
* The first, middle, and last 1000 bytes: bytes= 0-999, 4500-5499, -1000
@royfielding Sorry, but I don't see this example. Which section is this in?
though we'd probably have to check other servers to see if any fail on OWS there.
Is this something we should check before making a decision on what to do here? I'm happy to give running some tests a shot if so. Any info/tips here on running such tests would be helpful if this is something we need to do.
@annevk In cases like this, what do we tend to do? Next steps feel a bit unclear to me, so any feedback would be very helpful and appreciated. It seems like it would be easier to relax the stance in fetch to allow whitespace in the future if something were to change upstream in HTTP rather than add a restriction in the future. That being said I'm new here 😃
The example is at https://httpwg.org/specs/rfc9110.html#rfc.section.14.1.2 (search for "1000 bytes").
HTTP 2616 allowed whitespace:
- Before
= - After
= - Before
- - After
- - Before
, - After
,
My inclination would be to keep doing that as browsers implemented that correctly and in theory HTTP does not make incompatible changes except if there is a security concern (and even then it's controversial), which I don't think there is in this case.
Given that we only consume this for blob: URLs which is outside the realm of HTTP I think that is okay.
At the same time, keeping it no-whitespace-allowed for CORS makes sense to me as in that case it's in the realm of HTTP so being stricter will result in more interoperability.
Producing is already no-whitespace-allowed: https://fetch.spec.whatwg.org/#concept-request-add-range-header.
Thanks, this seems to make sense. So then essentially for blob: URLs we're essentially following the Recipient Requirements (allows whitespace), but for CORS we're following the stricter Sender Requirements. AFAIK the infrastructure for this is implemented in https://github.com/whatwg/fetch/pull/1564, but perhaps a extra note or comment should be added to clarify things.