orb icon indicating copy to clipboard operation
orb copied to clipboard

HLS manifest is fetched across origins

Open annevk opened this issue 2 years ago • 12 comments

As discussed in https://github.com/whatwg/html/issues/6468 we need to make adjustments to handle HTTP Live Streaming correctly.

annevk avatar Oct 27 '21 13:10 annevk

Let me try to summarize the available options.

Strict MIME type enforcement for DASH/HLS

One option on the table is to make ORB allow DASH/HLS MIME types like application/dash+xml, application/vnd.apple.mpegurl, text/vtt, etc. (i.e. removing those types from ORB's "opaque-blocklisted-never-sniffed MIME type" and adding them to "opaque-safelisted MIME type").

This option is appealing from an ease-of-implementation perspective.

Let me try to outline my understanding of web-compatibility of this option (i.e. will adopting this option break websites that used to work in browser X / on platform Y):

  • This option seems web-compatible with the current media stack on Chrome/desktop (where lack of native support for DASH/HLS means that websites have to handle DASH/HLS playback via fetch(...)-mediated polyfills which are subject to CORS).
  • This option seems web-compatible with a hypothetical, future native DASH/HLS support in Chrome/desktop (as long as Chromium can commit to enforces strict MIME types in such a future implementation).
  • This option might break websites on Chrome/Android (where the manifest is fetched in no-cors mode before handing off the playback to be handled outside of the browser and outside of CORB/ORB). We should be able to measure how often non-strict MIME type is encountered in practice - I've opened https://crbug.com/1269852 to track this work.
  • Of course other browsers (especially browsers with preexisting native support for DASH/HLS) might encounter different backcompatibility challenges. In particular @annevk notes in https://github.com/whatwg/html/issues/6468#issuecomment-956359885 that media manifests are sometimes recognized by file extensions.

Detecting DASH/HLS by sniffing (including range responses)

Another option is to sniff DASH/HLS responses. Some sniffing ideas have been described by @sandersdan in https://github.com/whatwg/html/issues/6468#issuecomment-956466606.

I note that sniffing based on a single ASCII character (e.g. 0x47) seems insecure - if a sensitive resource (e.g. HTML, XML, or JSON) contains such character, then an attacker might issue a range request that starts with such character.

This approach also assumes that DASH/HLS implementations will only issue range requests that start at segment boundaries.

Parsing DASH/HLS manifest in ORB

Another option is to fully parse DASH/HLS manifests in ORB - this would help to:

  • Allow the responses carrying the manifest.
  • Detect additional video (and audio, text/vtt, etc.) subresources that need to be allowed by ORB

This approach requires parsing the whole response body (not just the 1st 1024 bytes). This is quite similar to how the full response body might need to be parsed as Javascript (and therefore maybe this is something that ORB implementations need to tackle anyway).

This approach also tightly couples ORB implementations and DASH/XML standard (e.g. future changes to the manifest format would need to be reflected in ORB's parser).

anforowicz avatar Nov 12 '21 19:11 anforowicz

As discussed in https://github.com/whatwg/html/issues/6468 it's not clear to me how HLS doesn't allow a complete bypass of ORB. And that's because the origins of the resources the HLS resource points to don't have to match the HLS resource origin. So an attacker can create a HLS resource and make it point to a variety of resources across the web they want to read.

I suppose an option there might be that decoding happens in another process, but doesn't decoding sometimes work for somewhat arbitrary inputs which would then result in information leakage?

annevk avatar May 17 '22 07:05 annevk

but doesn't decoding sometimes work for somewhat arbitrary inputs which would then result in information leakage

There are demonstrated exploits that use concatenation; {known header} + {injected bytes} can be used to reliably leak data. This was possible to do in Chrome in the distant past by replying to a request with partial data, then redirecting the followup range request.

I'm not sure if/how a similar approach could be applied to HLS. It has powerful range primitives, but is also expecting the media data to be complete within each chunk (rather than allowing arbitrary concatenation).

I would say that it's theoretically possible for such a thing to occur, but without some exciting new technique it's not a feasible attack in practice.

As discussed in https://github.com/whatwg/html/issues/6468 it's not clear to me how HLS doesn't allow a complete bypass of ORB

This much seems to be true. HLS somewhat explicitly is allowed to access cross-origin content in a way that is incompatible with ORB, because it allows the Content-Type to be arbitrary.

Maybe there is a hybrid request mode we could use? Something like: start with no-cors, then if the Content-Type isn't safe switch to making cors requests (or perhaps the reverse to encourage CORS). This might work around the issue of few sites setting the crossorigin attribute despite using CORS-enabled CDN configurations.

sandersdan avatar May 17 '22 18:05 sandersdan

It has powerful range primitives, but is also expecting the media data to be complete within each chunk (rather than allowing arbitrary concatenation).

Can you elaborate on what this means? Does each "complete" piece of media data have some kind of identifiable container?

Something like: start with no-cors, then if the Content-Type isn't safe switch to making cors requests (or perhaps the reverse to encourage CORS).

Interesting idea. Let's see:

  1. Media element makes ordinary no-cors request to X.
  2. X is an HLS resource.
  3. The ORB network filter detects X is an HLS resource somehow and returns a network error.
  4. (We could consider annotating this network error with a bit if we don't want to retry generally.)
  5. The media element sees the network error and retries but now enforces CORS (recursively, including for HSL subresources).

I think it would be simpler to allow HLS resources to bypass ORB (assuming we can identify them through sniffing or MIME type) and require CORS for HLS subresources. Or is what you're saying that even HLS subresources could often be identified as media and so we wouldn't have to fallback as early?

An alternative on that might be that if an origin hosts an HLS resource and that points to subresources on the same origin, we'd request those with no-cors (as well as those on the same origin as the requestor, although that is a bit of a confused deputy situation I'm guessing it is okay given that it's media), but any resources on other origins go with CORS.

annevk avatar May 18 '22 06:05 annevk

Can you elaborate on what this means? Does each "complete" piece of media data have some kind of identifiable container?

Each segment is sequence of frames in a media container, in the sense that demuxing can start at the first byte. It should also end demuxing at the last byte, but I wouldn't recommend relying on implementations to check that before using any of the data.

Unlike the concatenation example above, HLS demuxer state does not carry over and resume in the next segment. Each segment is fully independent.

The container for media data in HLS is either MPEG TS or MP4. MPEG TS is a sequence of header+data chunks that can be parsed sequentially; MP4 is a tree structure that is parsed completely before extracting media data. (Note: in practice MP4 is typically ordered such that streaming is possible once the important metadata is parsed.)

Either of those can be easily verified given all of the bytes, but detecting them from only a short window at the start is challenging. A valid MP4 can start with essentially arbitrary bytes unless we make restrictions that go beyond the specification requirements (eg. we could require that a known box type occurs within that window).

Or is what you're saying that even HLS subresources could often be identified as media and so we wouldn't have to fallback as early?

I'm offering that if sniffing isn't working out then a hybrid request strategy may be a sufficient workaround. We know that many HLS resources are being served with inaccurate Content-Type from CDNs that are CORS-allowed, what we don't yet know is how often authors will have set the crossorigin attribute correctly on the <video> tag (I'm guessing not often).

The combination of inaccurate Content-Type + not CORS-allowed may be small enough to drop support for (this is untested).

(Edit: It's possible to combine everything together: if Content-Type isn't media, try to sniff. In the hopefully rare case that sniffing is inconclusive, reject with a flag and let that the media stack retry with CORS.)

sandersdan avatar May 18 '22 17:05 sandersdan

Given how desktop Chromium and Gecko browsers work, I'm starting to lean on 'Strict MIME type enforcement for DASH/HLS' approach. Is there any better data than https://bugs.chromium.org/p/chromium/issues/detail?id=1269852#c6 about the mimetype usage on Android?

smaug---- avatar Jul 11 '22 17:07 smaug----

I've updated the linked bug with stable results. In short, the non-stable results were roughly accurate.

sandersdan avatar Jul 11 '22 20:07 sandersdan

And those 1.5% are without cors? Or is it just checking mimetype? But aren't we interested only in the case where one uses weird mimetype without cors?

(It seems a bit surprising for non-WebView case that one would use cross-origin without cors, if desktop browsers require cors through fetch(). But perhaps sites do have very specific code paths for mobile)

smaug---- avatar Jul 12 '22 09:07 smaug----

I do not have data that is able to correlate MIME type with CORS state.

sandersdan avatar Jul 12 '22 17:07 sandersdan

@annevk I wonder if Apple has any data on what kinds of requests HLS implementation ends up doing.

smaug---- avatar Aug 02 '22 13:08 smaug----

@smaug---- apologies for responding so late. What kind of data are you looking for exactly? The request mode of the requests made based on the data in the HLS manifest?

annevk avatar Oct 18 '22 15:10 annevk

Which request mode and which mimetype is being used with HLS.

smaug---- avatar Oct 19 '22 09:10 smaug----