in-web-browsers icon indicating copy to clipboard operation
in-web-browsers copied to clipboard

Signed/Bundled HTTP Exchanges and WebPackage

Open lidel opened this issue 7 years ago • 41 comments
trafficstars

This issue tracks ideas, use cases and work related to Web Packaging, especially Signed HTTP Exchanges (SXGs) and Bundled HTTP Exchanges which open the door to associating an origin with content that was not explicitly retrieved from that origin by the browser. Previous workarounds for the "origin problem" can be found in https://github.com/ipfs/in-web-browsers/issues/89 and https://github.com/ipfs/in-web-browsers/issues/66.

Background

Google is championing work on "Web Packaging" to solve MITM (aka "misattribution problem") of the AMP Project. Signed HTTP Exchanges (SXG) decouple the origin of the content from who distributes it. Content can be published on the web, without relying on a specific server, connection, or hosting service, which is highly relevant for IPFS, as it is great at distributing immutable bundles of data.

2018: Signed HTTP Exchanges

A longer overview can be found at developers.google.com: Signed HTTP Exchanges:

2018-11-08--00-02-09

The Google Chrome team is working towards making this an IETF spec and have a prototype built for Chrome with an origin trial starting with Chrome 71.

People would like to use content offline and in other situations where there isn’t a direct connection to the server where the content originates. However, it’s difficult to distribute and verify the authenticity of applications and content without a connection to the network. [..]

Previous attempts at packaging web resources [..] were motivated by speeding up the download of resources from a single server [..] This attempt is instead motivated by avoiding a connection to the origin server at all. #

It is worth noting that this is still a very PoC spec and current version of SXGs is considered harmful by Mozilla and the spec needs further work.

2019 Q3: Bundled HTTP Exchanges, AKA Web Bundles

Web Bundles, more formally known as Bundled HTTP Exchanges, are part of the Web Packaging proposal.

To be precise, a Web Bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type.

More at https://web.dev/web-bundles/

Potential IPFS Use Cases

How does this fit in with P2P distribution? Is the future of web publishing signed+versioned bundles over IPFS?

IPFS as transport for SXG / Web Bundles

  • Signed/Bundled Exchange provide means to separate the URL authority from the delivery mechanism of the document. This allows the IPFS to deliver documents on behalf of a third party which signed the exchange bundle and play nice with legacy PKI.
    • In simpler words: a bundle with entire website (or parts of it) can be loaded over IPFS and browser supporting signed exchange will validate signatures and render content with original domain and green lock in the location bar. Click below to watch 1 minute demo:

      ipfs-webpackage Source: https://github.com/jimpick/signed-exchange-test

    • This means one could set up DNSLink pointing at Signed HTTP Exchange and users of IPFS Companion would load cached websites over IPFS while keeping "original" URLs in location bar

      • Digression: right now Chrome "lies" to user and displays "https://" as a protocol, which raises valid concerns. I suspect it will end up being "wpack://" or something like that.
    • Alternative way to use this, would be to create Service Worker orchestration that loads website via SXG snapshot fetched from IPFS as means of failover/workaround for DDoS or censorship scenarios. (See initial experimentation in https://github.com/ipfs/in-web-browsers/issues/121#issuecomment-444769959)

Archival Use Cases (Web Bundles)

  • Bundled Exchange file format could provide standardized means of creating future-proof website snapshots

    • Archival use case is mentioned here: https://tools.ietf.org/html/draft-yasskin-webpackage-use-cases-01#section-2.2.10
    • This is potentially HUGE. Imagine immutable Internet Archive fetching snapshot from IPFS. Or Wikipedia references section pointing at content-adressed snapshots etc :eyes:
    • See https://github.com/ipfs/in-web-browsers/issues/94
  • ? (add more ipfs-specific uses in comments below!)

Learning Materials

WebPackage 101

  1. Fixing AMP URLs with Web Packaging (20min primer on Web Packaging)

    After this talk, you will have a solid grasp on the proposed solution to AMP's URL misattribution problem, and how Cloudflare is positioned to take the necessary steps to provide this fix to existing AMP publishers with minimal setup, and no code required.

  2. Web Packaging Format Explainer

    This document describes use cases for packaging websites and explains how to use the cluster of specifications in this repository to accomplish those use cases. It serves similar role as typical "Introduction" or "Using" and other non-normative sections of specs.

  3. Use Cases and Requirements for Web Packages

    Longer, more comprehensible read. This document lists use cases for signing and/or bundling collections of web pages, and extracts a set of requirements from them.

Known Problems and Concerns

  • 2018 Q1: Mozilla's Position: "harmful" (in current form)

    Mozilla has concerns about the shift in the web security model required for handling web-packaged information. Specifically, the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome, as is the removal of a guarantee of confidentiality from the web security model (the host serving the web package has access to plain text). We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases so long as the foregoing concerns could be addressed. – https://mozilla.github.io/standards-positions/ & https://github.com/mozilla/standards-positions/issues/29#issuecomment-365786829

    • 2019 Q2: Mozilla Reaffirmed this position in a longer document: https://github.com/mozilla/standards-positions/issues/29#issuecomment-495122302
  • 2019 Q2: Signed HTTP Exchanges Could Allow Ads to get Around DNS-based ad blocking

    – https://www.reddit.com/r/pihole/comments/alwkh1/

  • Built-In Tracking in Signed Packages - https://github.com/WICG/webpackage/issues/422
    • Threat model: https://github.com/WICG/webpackage/pull/424

References

Web Packaging Primer

Additional Resources

cc @jimpick @mikeal

lidel avatar Oct 16 '18 22:10 lidel

This is pretty crude, but these are the files I used in my demo:

https://github.com/jimpick/signed-exchange-test

Probably the only thing really re-usable in that is the service worker (sw-ipfs.js) which intercepts requests and loads the associated .sxg file from the ipfs.io gateway.

jimpick avatar Oct 17 '18 04:10 jimpick

I doubt I'll have time to dive into this before I go on vacation but this is awesome!

mikeal avatar Oct 17 '18 16:10 mikeal

Instructions on how to get into the origin trial here:

https://twitter.com/kinu/status/1055825077281939456

jimpick avatar Oct 27 '18 20:10 jimpick

Another link!

https://developers.google.com/web/updates/2018/11/signed-exchanges

jimpick avatar Nov 05 '18 05:11 jimpick

Content servers: If you want to host SXGs created by publishers on their behalf, you can participate in the origin trial to have the SXGs processed by Chrome without requiring your users turn on a flag. – /signed-exchanges#participate_in_the_origin_trial

IIUC we could look into enabling this on our HTTP Gateway, that way we could demo .sxg with regular Chrome without passing any special flags.

lidel avatar Nov 05 '18 11:11 lidel

A new talk just landed: :movie_camera: From Low Friction to Zero Friction with Web Packaging and Portals (Chrome Dev Summit 2018)

The focus is on UX, but I gathered highlights related to Web Packaging:

  • Introduction to Web Packaging: ~6:46
  • Signing exchanges for your site: ~10:49 (https://bit.ly/try-sxg)
  • How Cloudflare plans to support SXGs: ~14:05
  • Update on Bundled Exchanges with example use in offline new reader: ~21:39
  • Roadmap: ~23:21

ps. Our Origin Trial setup is tracked in https://github.com/ipfs/infrastructure/issues/453

lidel avatar Nov 14 '18 12:11 lidel

PSA:
Origin Trial for Signed HTTP Exchanges is enabled for ipfs.io Gateway

This means anyone can publish SXG on IPFS and it loads in regular Chrome 71 without any additional setup on user side.

Quick demo:

  1. Install Google Chrome 71
  2. Open SXG from our gateway, for example: https://ipfs.io/ipfs/QmVnnXjwXyEKhnrC1L7wegepUum2zN4JZUgtvA7DYtj4rG/sxg-location.sxg
  3. You will see location bar being replaced with Origin read from SXG! For the sample above it will be a localhost URL:

    2018-11-29--19-28-37

To create .sxg with Origin of your own domain follow steps from #creating_your_sxg.

This is just a brief update, expect a post at blog.ipfs.io with more details soon.

lidel avatar Nov 29 '18 18:11 lidel

This is so exciting! I'm going to experiment a bit with this later...

jimpick avatar Nov 29 '18 23:11 jimpick

I had a good meeting with the Google Chrome HTTP Signed Exchanges team in Tokyo today. I prepared a little demo:

https://ipfs.v6z.me/

It only works with Chrome Canary (Chrome Beta doesn't seem to work).

The top-level "bootstrap" website with the original index.html and service worker is published to IPFS (using IPNS). That's given it's own SSL certificate using https://cloudflare-ipfs.com/

Then the web content is processed with gen-signedexchange to generate a bunch of .sxg "HTTP Signed Exchange" files, which are published to IPFS (not using IPNS), and finally a 'ipfs-hash.txt' file with the hash of the content is written to the bootstrap site. The service worker looks at that file, and for any file that is being fetched, it will generate a redirect to the published content .sxg files hosted on the public ipfs.io gateway that Protocol Labs runs (which has the correct HTTP headers for the origin trial).

It's a little hard to explain to somebody unfamiliar with all the parts involved. Now that the demo is actually working, I'd love to do a proper blog post for it!

Source code for the demo: https://github.com/jimpick/signed-exchange-test/tree/ipfs.v6z.me-origin-trial

(sorry, no documentation yet ... I only get it working yesterday)

jimpick avatar Dec 06 '18 07:12 jimpick

Spec change opening ability to load SXG from locally running IPFS node: https://github.com/WICG/webpackage/pull/352

lidel avatar Jan 15 '19 22:01 lidel

Heads up that there's an AMP conf coming up in Tokyo that will likely have relevant discussion https://www.ampproject.org/amp-conf/

kyledrake avatar Feb 18 '19 15:02 kyledrake

PSA: Origin Trial ends on Mar 6, 2019 – I will extended our token till the trial end.

lidel avatar Feb 26 '19 10:02 lidel

Discussion about Service Worker and subresource SXG prefetching integration:

  • started in https://github.com/WICG/webpackage/issues/347#issuecomment-473782344
  • moved to https://github.com/WICG/webpackage/issues/409

lidel avatar Mar 21 '19 01:03 lidel

Cloudflare announced seamless generation of SXG for existing websites as "AMP Real URL". The feature will be available for free.

  • https://blog.cloudflare.com/announcing-amp-real-url/

    Google’s AMP Crawler downloads the content of your website and stores it in the AMP Cache many times a day. If your site has AMP Real URL enabled Cloudflare will digitally sign the content we provide to that crawler, cryptographically proving it was generated by you. That signature is all a modern browser (currently just Chrome on Android) needs to show the correct URL in the address bar when a visitor arrives to your AMP content from Google’s search results.

This older blogpost contains details on how signed content can be announced to the crawler.

lidel avatar Apr 17 '19 10:04 lidel

I tried copying an .sxg file found "in the wild" to IPFS and loading it through the gateway:

https://ipfs.io/ipfs/QmcMMFKpj4WtnfDinDh6vuTU5ViQD5ncVtRvTzXWYEyo5w/test1.sxg

In Chrome DevTools, the following error was displayed:

Screenshot 2019-04-29 22 53 05

Looks like we might need to tweak the header on the gateway.

jimpick avatar Apr 30 '19 05:04 jimpick

Good news, the gateway looks like it's updated and .sxg files are loading. :-)

https://ipfs.io/ipfs/QmWgYzCJuNupFeX1RLv27srqU1t7z6HJamUMeR9rm1zF2w

Edit: Actually, not yet ... I checked the headers, and they aren't updated yet. I think that's using a fallback. This stuff is confusing.

jimpick avatar May 01 '19 21:05 jimpick

Found a video of an IETF presentation on Web Packaging from this March https://youtu.be/woLbXaX0Gf4?t=700

Interesting that it is being presented to the IETF as a peer-to-peer technology!

Also, listen to the questions to hear @ekr from Mozilla express his strong "considered harmful" position in person.

jimpick avatar May 02 '19 01:05 jimpick

@jimpick Yep, the original use case was around peer-to-peer content distribution in places where mobile data is very expensive or unreliable. We only later realized we could think of the AMP cache as a "peer".

The big unsolved problem for our peer-to-peer model is the way clients discover packages. Doing it naively in the client gives the cache a full view of the client's browsing history, which isn't acceptable. When the peer is on the internet (e.g. AMP), the source of the link to the resource (e.g. Google Search) can provide discovery without leaking any more information. Maybe IPNS can be a more general way to discover packages cached nearby, if its privacy properties are right?

jyasskin avatar May 02 '19 14:05 jyasskin

@jyasskin You are absolutely correct in saying that naively sharing peer-to-peer will expose users privacy. Full privacy is a tough problem to design for.

We're actively working on enhancements to IPNS and the DHT to improve performance. And there is a lot of ongoing work in libp2p for private networks and relaying. I think it would be neat if it would be possible to be able to restrict lookups so that content is only ever retrieved via privacy preserving mechanisms, and that content can be shared or re-shared without danger. There's often going to be a tradeoff in privacy vs. performance.

To make things worse, many politicians, intelligence agencies, police forces and even corporate IT departments are opposed to true anonymity, so it gets into really tricky legal territory.

For non-client applications, there are many datasets which are essentially public and for which most people would prefer performance since the privacy concerns aren't too much of a problem. That's one reason we're primarily focused on package managers and performance this year.

jimpick avatar May 02 '19 15:05 jimpick

Here's my take on the WebPackage controversy, which is fundamentally a rethink about SSL certificates and what they represent to the reader:

  • on one side, represented by Google/Chrome/AMP, the idea is that SSL certificates can be used to sign content at the source, so a reader can look in the browser address bar, see the lock, and be assured that the content really came from "The Washington Post". This seems very reasonable to me. It's pretty much the same thing that Beaker Browser is doing with the Dat protocol (not using SSL).

  • on the other side, represented by Firefox, the idea is that SSL certificates are used to sign the connection/transport path. So a reader can look at the browser address bar, see the lock, and be assured that nobody has snooped on the connection between the original source and the reader's browser. This also seems very reasonable to me.

Right now, with AMP and HTTP Signed Exchanges, it is now possible for the "cached" exchanges to not transported directly from the Washington Post, but they are instead coming from Google or Cloudflare's CDN, which is not going to be spying on folks (they claim). Google is using their clout to provide what they call "privacy-preserving pre-fetch". Of course, if you are wearing a tinfoil hat, and you distrust Google, or the government, you might not think your privacy was preserved if Google's CDN is seeing all the documents being fetched.

The problem with peer-to-peer distribution and the experiments we (and others, such as @pfrazee at Beaker) are doing is that we are opening up the distribution to everybody, and the reader privacy problems get very tricky. So displaying the content with a lock saying it has come from "The Washington Post" might be true, but it's also quite possible that by retrieving the content via a peer-to-peer mechanism, there was a digital trail left, and reader privacy has been compromised ... so the "lock" displayed in the UI is misleading people to think that nobody can spy on them.

There has been much discussion about the reader privacy problem in the Dat community:

  • https://blog.datproject.org/2017/12/10/dont-ship/
  • https://blog.datproject.org/2018/01/16/dat-privacy-models/
  • https://twitter.com/SarahJamieLewis/status/1008111920988106753

Clearly, there are ways to improve reader privacy on peer-to-peer networks. For example, access could be made using Tor. Or via an encrypted link to a place that the reader trusts. Content could be distributed via broadcast (eg. satellite) and multicast mechanisms, so there are no direct accesses. Not accessing things directly but via trusted intermediaries and privacy-preserving peer-to-peer networks could actually be a privacy improvement. Peer-to-peer distribution has clear advantages when it comes to censorship resistance.

I wonder if peer-to-peer web browsers for the distributed web need more than one UI element to display trust and privacy information?

Two cases:

  • the content is signed, but it was acquired via a pathway that is actively advertising that the user has a copy so there is zero reader privacy. This might be just fine if a user is altruistically sharing the content.
  • the content is signed, but it can be cryptographically verified that it made it's journey to the reader only via privacy-preserving peer-to-peer networks that the reader has specifically expressed trust in. In this case, a member of a vulnerable population would be protected from a malicious interloper.

Is this an area that could benefit from UX research?

jimpick avatar May 03 '19 17:05 jimpick

UX issues around "HTTPS spoofing"

To expand on @jimpick's take, I believe contributing factor to the controversy is the UX of how SXG@v=b3 got implemented in Google Chrome. Looking from sidelines it may feel rushed and AMP-driven.

To be specific, Google Chome makes SXG indistinguishable from regular HTTPS, which breaks basic assumptions around how users understand the green padlock in location bar (aka "nobody but me and the Origin server can see the payload"). UX of regular HTTPS is reused as-is, pretending that end-to-end HTTPS transport was used with Origin from location bar, which is not true.

Browser should be the user agent, and as one it should never lie or break this type of trust.

To me it feels like UX problem. There should be a different presentation in location bar than re-using the green padlock from HTTPS. Browser should be honest that WebPackage was used and show who was involved in rendering the page: who is the Publisher, when package was created, who was involved in Distributing the content etc.

Need for Demonstrating Archival Use Cases

I believe archiving is a missed opportunity to make a case for WebPackage and figure out technical details and UX in browser without going into politics of PKI and HTTPS spoofing.

Would love to see more happening around this use case. Browsers could add support for saving a website to a WebPackage bundle and loading it from it while making it obvious to the user that they are looking at an archived snapshot, with all details at hand.

This would add real value to the web by empowering individuals and institutions (Internet Archive, Wikipedia) with tools to fight the link rot and censorship. Imagine all Wikipedia References as reproducible snapshots of articles that could be downloaded, shared and read offline.

Worth looking at is the potential overlap with W3C's Packaged Web Publications: https://github.com/w3c/pwpub

Gateway Update: ipfs.io supports v=b3

Good news: we've updated HTTP headers at our IPFS Gateway. Errors from https://github.com/ipfs/in-web-browsers/issues/121#issuecomment-487828624 should be gone, responses for ipfs.ip/ipfs/**.sxg now include:

Content-Type: application/signed-exchange;v=b3
X-Content-Type-Options: nosniff

Test in Chrome 74+: index.html.sxg :)

lidel avatar May 04 '19 16:05 lidel

To me it feels like UX problem

I think problem is far greater than UX, that is users are not in control - If all the pages visited through chrome are served through AMP regardless of icon in the location bar user privacy is compromised.

Gozala avatar May 15 '19 01:05 Gozala

I found some more videos (thanks YouTube) that go over WebPackaging and Signed HTTP Exchanges in quite a bit of detail.

BlinkOn 9 (April 2018): https://www.youtube.com/watch?v=rcJ9BLymVQE

BlinkOn 10 (April 2019): https://www.youtube.com/watch?v=iTYr5qVbHdo

jimpick avatar May 16 '19 03:05 jimpick

Mozilla published 15 page paper which reaffirms their position: https://github.com/mozilla/standards-positions/issues/29#issuecomment-495122302

Concentrating on security issues is relatively easy. Coming to terms with a fundamental change to the security and content delivery model of the web is a more difficult task. This document tries to go further and explore other potentially problematic parts in the technology. [...] The increased exposure to security problems and the unknown effects of this on power dynamics is significant enough that we have to regard this as harmful until more information is available.

Quick takeaways:

  • Potential value of WebPackaging around offline uses (bundling web content) is recognized, but the paper does not spend much time on value proposition there because current (b3) spec and use cases are focused on SXG/AMP/content distribution and WebPackage Bundles are not implemented yet in Chrome. It could be a different story if bundles shipped before SXG.
  • Origin substitution (aka HTTPS/Origin spoofing) remains to be the main technical problem. At one point paper suggests iterative approach where WebPackaged content is assigned a separate Origin
    • some browser vendors are already double-keying Origin when handling "third-party cookies", I wonder if similar mechanism could be (re)used here
  • Complexity. Is value introduced by WebPackaging enough to justify it?
  • Power dynamics and unexpected consequences are big unknowns

lidel avatar May 23 '19 11:05 lidel

I believe archiving is a missed opportunity to make a case for WebPackage and figure out technical details and UX in browser without going into politics of PKI and HTTPS spoofing.

We have an accepted position paper entitled, "Supporting Web Archiving via Web Packaging" (preprint version) in IAB's ESCAPE 2019 Workshop and hope to take this conversation there.

Last year in Web Archiving and Digital Libraries (WADL) 2018 Workshop we illustrated a mock up of UI/UX element that browsers can show in the address bar to acknowledge users about the state of a resource being archived (i.e., a memento). The icon can reveal a lot more context and metadata about the memento when clicked/tapped on (see slide #76 of the keynote talk).

Archive Icon

ibnesayeed avatar Jun 27 '19 19:06 ibnesayeed

I am back from the ESCAPE Workshop last week. I learned a lot and conveyed the message/issues/needs related to web archiving which was taken very well. I can recap my slides on Supporting Web Archiving via Web Packaging in one of the upcoming IPFS Weekly Calls if there is any interest in it.

ibnesayeed avatar Jul 22 '19 16:07 ibnesayeed

@ibnesayeed I'm super interested! Would you like to present it next week? I still haven't lined up a speaker yet.

jimpick avatar Jul 22 '19 19:07 jimpick

@ibnesayeed I'm super interested! Would you like to present it next week? I still haven't lined up a speaker yet.

@jimpick, next Monday (July 29) works for me for now.

ibnesayeed avatar Jul 23 '19 13:07 ibnesayeed

@ibnesayeed Great, I'll send you an email.

jimpick avatar Jul 24 '19 22:07 jimpick

Updates in support for creating Bundled HTTP Exchanges from a list of URL:

lidel avatar Sep 06 '19 11:09 lidel