service-worker-gateway icon indicating copy to clipboard operation
service-worker-gateway copied to clipboard

"Save as" broken: saves SW bootstrap HTML instead of IPFS data

Open agmap opened this issue 11 months ago • 18 comments

Many files are not wrapped in a directory. The file extension is therefore missing. But browsers seem to extract the file extension from the magic numbers / file signatures.

Json files When I open a json file in ipfs.io, it gets the correct file extension. When I open a json file in inbrowser.link its in an HTML page. I can’t download it as json. For example: https://ipfs.io/ipfs/bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.link/

The same applies to image files: If I open an image in ipfs.io, the image gets the correct file extension If I open an image file in inbrowser.link, it is in an HTML page and I cannot download it as an image from the browser. Example: https://ipfs.io/ipfs/bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce https://bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce.ipfs.inbrowser.link/

agmap avatar Feb 12 '25 08:02 agmap

@agmap can you clarify what browser you are using and how you are trying to "download it as X" ?

I just deployed v1.8.2 onto inbrowser.link so it's got a lot of updates.

In Brave, if I right click an image and "save" it works as expected. However, if I go to `File > Save page As" I am seeing the experience you've described..

I am not sure if browsers handle resources returned from service-workers differently, but we can investigate

SgtPooki avatar Feb 12 '25 12:02 SgtPooki

@agmap I saved the raw HEX of the files from both ipfs.io and inbrowser.link to .bin files (copy full hex, save to file by pasting) and then compared them and they are not different, so I am wondering if this is something we can control...

@lidel any ideas?

Image

Compare command ran:

cmp -l ipfs-io-ipfs-bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce.bin bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce-ipfs-inbrowser-link.bin

# no output because they're the same

SgtPooki avatar Feb 12 '25 12:02 SgtPooki

I am using chrome desktop in windows 11.

For the example of the json: Right click --> "save as"

  • ipfs.io: I am getting the the *.json (bafkre…eqyy.json) --> also pretty print
  • inbrowser.link: I am getting an *.html (bafkrei…eqyy.ipfs.inbrowser.link.html) --> not pretty print

For the example of the image: Right click --> "save image as"

  • ipfs.io: I am getting the image as cid.png (bafybei….blhnce.png)
  • inbrowser.link: I am getting an *.html (download.html)

Image

agmap avatar Feb 12 '25 12:02 agmap

Smells like something related to Content-Type and/or Content-Disposition header in HTTP response. They inform browser how to render content and what filename is used in "Save As".

  • https://specs.ipfs.tech/http-gateways/path-gateway/#cache-control-response-header
  • https://specs.ipfs.tech/http-gateways/path-gateway/#content-disposition-response-header

We likely should have JSON/PNG tests in gateway-conformance for this – filled https://github.com/ipfs/gateway-conformance/issues/234 so we confirm or close that gap.

Did quick test (Firefox) for first two links from the original report and inbrowser.link seems to return invalid content-type HTTP header:

inbrowser.link (SW) dweb.link (Rainbow)
Image Image

@SgtPooki do you know if it is a bug in verified-fetch, or browser not displaying HTTP header from Response correctly ?

lidel avatar Feb 13 '25 16:02 lidel

@SgtPooki do you know if it is a bug in verified-fetch, or browser not displaying HTTP header from Response correctly ?

I will investigate if I can reproduce when running npm run build && node packages/gateway-conformance/dist/src/demo-server.js from helia-verified-fetch to see if this is a verified-fetch problem, or a sw-gateway problem.

SgtPooki avatar Feb 28 '25 16:02 SgtPooki

verified-fetch is for sure returning content-type of html for json... but that is because of https://github.com/ipfs/helia-verified-fetch/blob/a041bdba954610beeacb6c768b0c4907633ae4fb/packages/gateway-conformance/src/fixtures/content-type-parser.ts#L7

sw-gateway does this at https://github.com/ipfs/service-worker-gateway/blob/17421d6bb0c7981c5689a16669786fe0b6abe2cf/src/lib/content-type-parser.ts#L5

Here are logs for the request of bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy (json):

  helia:trustless-gateway-block-broker:127.0.0.1 GET http://127.0.0.1:8080/ipfs/bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy?format=raw 200 +0ms
  helia:trustless-gateway:session:bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy:trace got block for bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy from http://127.0.0.1:8080/ +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set _fileSize to 759 +41ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace requestRangeStart and requestRangeEnd are null +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set request body with fileSize 759 +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace returning body unmodified for non-range, or invalid range, request +0ms
  content-type-parser contentTypeParser called for fileName: undefined +0ms
  content-type-parser no detectedType +1ms
  content-type-parser checking for svg +0ms
  helia:verified-fetch:raw-plugin:trace setting content type to "text/html; charset=utf-8" +43ms
  helia:verified-fetch:trace checking for content disposition +43ms
  helia:verified-fetch:trace download not requested +0ms
  helia:verified-fetch:trace no filename specified in query +0ms
  helia:verified-fetch:trace no content disposition specified +0ms

This has to do with https://www.npmjs.com/package/@sgtpooki/file-type (fork of https://www.npmjs.com/package/file-type) not handling content-type for text.

if it's json, we will need to read the entire contents to determine if it's json.. we do allow passing a custom content-type parser to helia-verified-fetch, so we could parse the entire content to determine if it's json.


Here are logs for the request of bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce (image):

  helia:verified-fetch:dag-pb-plugin got async iterator for bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce/ +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set _fileSize to null +1ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace requestRangeStart and requestRangeEnd are null +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set request body with fileSize null +0ms
  helia:verified-fetch:helia:verified-fetch:byte-range-context:trace returning body unmodified for non-range, or invalid range, request +0ms
  content-type-parser contentTypeParser called for fileName: undefined, byte size=262144 +8s
  content-type-parser detectedType: image/png +0ms
  helia:verified-fetch:dag-pb-plugin:trace contentTypeParser returned image/png +1ms
  helia:verified-fetch:dag-pb-plugin:trace setting content type to "image/png" +0ms
  helia:verified-fetch:trace checking for content disposition +1ms
  helia:verified-fetch:trace download not requested +0ms
  helia:verified-fetch:trace no filename specified in query +0ms
  helia:verified-fetch:trace no content disposition specified +0ms

SgtPooki avatar Feb 28 '25 17:02 SgtPooki

Ok I've got "File -> save page as" to result in .json download, but right clicking the page and save as still results in download.html with a local sw instance when running https://github.com/ipfs/service-worker-gateway/pull/603

FYI i've deployed this to inbrowser.dev: https://github.com/ipfs/service-worker-gateway/actions/runs/13594225197/job/38007334529

SgtPooki avatar Feb 28 '25 17:02 SgtPooki

Changes from #603 are now at https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/. not sure what is up with the right click and save but you can see that it renders that json correctly now

SgtPooki avatar Feb 28 '25 18:02 SgtPooki

.html looks like a bug or behavior specific to Service Workers in Chromium. Trying to save https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/ in Firefox produces a proper .json

Things to try to fix Chromium:

  • It is suspicious that x-content-type-options: nosniff is returned by inbrowser.dev (we see issues) and not dweb.link (no issues). I'll Remove x-content-type-options: nosniff from initial response at inbrowser.dev to ensure header parity with dweb.link (opened https://github.com/ipshipyard/waterworks-infra/pull/516 but we will land it next week)
  • Always return explicit Content-Disposition with expected file extension.
  • If above fail, we could create small HTML+JS repro that we can fill as a bug in Chromium and ask Igalia for help in fixing it.

lidel avatar Feb 28 '25 20:02 lidel

Even strange behavior in chrome when the files are wrapped in a directory with filename and extension.

json: filename and extension is correct. https://bafybeigxip3b5uegqlmcodhitzogr7w46uyedsai4kecrxlu4w74szyfxi.ipfs.inbrowser.dev/metadata.json

Images: filename is correct, but extension is wrong. https://bafybeifiugn3zaj67dh6zqlgqpnzkbykbec3iur6wpb2zaz44ekcazyew4.ipfs.inbrowser.dev/image.jpeg

Everything is fine in Firefox.

agmap avatar Mar 01 '25 06:03 agmap

Tested in Chromium and removal of x-content-type-options: nosniff does not seem to make any difference.

JSON has special-handling UI in Chromium and also on our end, no not a surprise it happens to "work fine". Other files likely hit default code path.

Chrome's "Save As" (Ctrl+S) feature doesn't just save the resource served at the URL (/image.jpg). Instead, it saves the entire document context of the current tab. If /image.jpg is being served in a way that’s tied to an HTML page (e.g., loaded via an <img> tag or a redirect from an HTML document), Chrome might be saving that originating HTML page instead of the raw image file.

The fact that the saved index.html includes the HTML+JS that initialized the Service Worker strongly suggests that the browser is saving the page that registered the Service Worker, not the raw response for /image.jpg

@SgtPooki some ideas to try next:

  1. review Service Worker code for any logic that might return text/html HTML response as a fallback

    • we should never default to text/html blindly
    • I am suspicious this defaultMimeType = 'text/html' hack could be related
    • it looks like temporary hack to make things work – could we remove it and replace with application/octect-stream + proper content type sniffing?
  2. if (1) does not help, make SW GW set explicit Content-Disposition: inline; filename="image.jpg" matching file name from path (and if there is no path, set it to file.ext where .ext is returned by file-type sniffer).

lidel avatar Mar 03 '25 16:03 lidel

if (1) does not help, make SW GW set explicit Content-Disposition: inline; filename="image.jpg" matching file name from path (and if there is no path, set it to file.ext where .ext is returned by file-type sniffer).

I tried this locally and it doesn't help:

+    response.headers.set('Vary', 'Accept-Encoding')
+    const contentType = response.headers.get('Content-Type')
+    let filename = 'download.bin'
+    if (contentType != null) {
+      switch (contentType) {
+        case 'application/json':
+          filename = 'download.json'
+          break
+        default:
+          break
+      }
+    }
+    response.headers.set('Content-Disposition', `inline; filename="${filename}"`)

SgtPooki avatar Mar 03 '25 20:03 SgtPooki

FYI, this seems to be a bug with chrome. see https://sgtpooki.github.io/chrome-sw-filetype-bug-repro/

source is at https://github.com/SgtPooki/chrome-sw-filetype-bug-repro

chromium bug filed at https://issues.chromium.org/u/1/issues/400455011

SgtPooki avatar Mar 03 '25 23:03 SgtPooki

Quick update: our friends at Igalia will look into https://issues.chromium.org/issues/400455011 (and https://issues.chromium.org/issues/40410035) once the ongoing work on https://issues.chromium.org/issues/40069954 is done.

lidel avatar Apr 10 '25 19:04 lidel

The JSON extension still isn't working properly in Firefox and Chrome. It's weird, but Firefox recognises it as JSON and can save it as such. But when I open the json file, it's got HTML code in it. Here's an example: 1st: CID without filename https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/

2nd: CID with filename https://bafybeigxip3b5uegqlmcodhitzogr7w46uyedsai4kecrxlu4w74szyfxi.ipfs.inbrowser.dev/metadata.json

agmap avatar May 27 '25 15:05 agmap

the bug is still in chrome.

However, a potential workaround for now might be the content-rendering screen at https://github.com/ipfs/service-worker-gateway/pull/751.. let me know how you feel about that..

Note that the original purpose of that PR is now resolved separately by https://github.com/ipfs/service-worker-gateway/pull/753

SgtPooki avatar Jun 12 '25 17:06 SgtPooki

https://github.com/ipfs/service-worker-gateway/issues/574#issuecomment-2795016251 will not be fixed any time soon (being realistic about devgrants, funding, and how long it takes for fix to land in Chrome stable: it for sure wont be fixed in next 6-12 months).

This means we need to fix it in userland. My initial idea is to return custom HTML UI for files that browser can render (images, PDFs, videos), with explicit "Download" button. This way user does not use "Save As" from browser, and instead uses the UI provided by us.

lidel avatar Oct 09 '25 15:10 lidel

https://github.com/ipfs/service-worker-gateway/issues/574#issuecomment-2795016251 will not be fixed any time soon (being realistic about devgrants, funding, and how long it takes for fix to land in Chrome stable: it for sure wont be fixed in next 6-12 months).

This means we need to fix it in userland. My initial idea is to return custom HTML UI for files that browser can render (images, PDFs, videos), with explicit "Download" button. This way user does not use "Save As" from browser, and instead uses the UI provided by us.

I did have a download Ui, i think i closed that PR, but you could reuse some of that if necessary

SgtPooki avatar Oct 10 '25 11:10 SgtPooki