"Save as" broken: saves SW bootstrap HTML instead of IPFS data
Many files are not wrapped in a directory. The file extension is therefore missing. But browsers seem to extract the file extension from the magic numbers / file signatures.
Json files When I open a json file in ipfs.io, it gets the correct file extension. When I open a json file in inbrowser.link its in an HTML page. I can’t download it as json. For example: https://ipfs.io/ipfs/bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.link/
The same applies to image files: If I open an image in ipfs.io, the image gets the correct file extension If I open an image file in inbrowser.link, it is in an HTML page and I cannot download it as an image from the browser. Example: https://ipfs.io/ipfs/bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce https://bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce.ipfs.inbrowser.link/
@agmap can you clarify what browser you are using and how you are trying to "download it as X" ?
I just deployed v1.8.2 onto inbrowser.link so it's got a lot of updates.
In Brave, if I right click an image and "save" it works as expected. However, if I go to `File > Save page As" I am seeing the experience you've described..
I am not sure if browsers handle resources returned from service-workers differently, but we can investigate
@agmap I saved the raw HEX of the files from both ipfs.io and inbrowser.link to .bin files (copy full hex, save to file by pasting) and then compared them and they are not different, so I am wondering if this is something we can control...
@lidel any ideas?
Compare command ran:
cmp -l ipfs-io-ipfs-bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce.bin bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce-ipfs-inbrowser-link.bin
# no output because they're the same
I am using chrome desktop in windows 11.
For the example of the json: Right click --> "save as"
- ipfs.io: I am getting the the *.json (bafkre…eqyy.json) --> also pretty print
- inbrowser.link: I am getting an *.html (bafkrei…eqyy.ipfs.inbrowser.link.html) --> not pretty print
For the example of the image: Right click --> "save image as"
- ipfs.io: I am getting the image as cid.png (bafybei….blhnce.png)
- inbrowser.link: I am getting an *.html (download.html)
Smells like something related to Content-Type and/or Content-Disposition header in HTTP response.
They inform browser how to render content and what filename is used in "Save As".
- https://specs.ipfs.tech/http-gateways/path-gateway/#cache-control-response-header
- https://specs.ipfs.tech/http-gateways/path-gateway/#content-disposition-response-header
We likely should have JSON/PNG tests in gateway-conformance for this – filled https://github.com/ipfs/gateway-conformance/issues/234 so we confirm or close that gap.
Did quick test (Firefox) for first two links from the original report and inbrowser.link seems to return invalid content-type HTTP header:
| inbrowser.link (SW) | dweb.link (Rainbow) |
|---|---|
@SgtPooki do you know if it is a bug in verified-fetch, or browser not displaying HTTP header from Response correctly ?
@SgtPooki do you know if it is a bug in verified-fetch, or browser not displaying HTTP header from
Responsecorrectly ?
I will investigate if I can reproduce when running npm run build && node packages/gateway-conformance/dist/src/demo-server.js from helia-verified-fetch to see if this is a verified-fetch problem, or a sw-gateway problem.
verified-fetch is for sure returning content-type of html for json... but that is because of https://github.com/ipfs/helia-verified-fetch/blob/a041bdba954610beeacb6c768b0c4907633ae4fb/packages/gateway-conformance/src/fixtures/content-type-parser.ts#L7
sw-gateway does this at https://github.com/ipfs/service-worker-gateway/blob/17421d6bb0c7981c5689a16669786fe0b6abe2cf/src/lib/content-type-parser.ts#L5
Here are logs for the request of bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy (json):
helia:trustless-gateway-block-broker:127.0.0.1 GET http://127.0.0.1:8080/ipfs/bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy?format=raw 200 +0ms
helia:trustless-gateway:session:bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy:trace got block for bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy from http://127.0.0.1:8080/ +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set _fileSize to 759 +41ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace requestRangeStart and requestRangeEnd are null +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set request body with fileSize 759 +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace returning body unmodified for non-range, or invalid range, request +0ms
content-type-parser contentTypeParser called for fileName: undefined +0ms
content-type-parser no detectedType +1ms
content-type-parser checking for svg +0ms
helia:verified-fetch:raw-plugin:trace setting content type to "text/html; charset=utf-8" +43ms
helia:verified-fetch:trace checking for content disposition +43ms
helia:verified-fetch:trace download not requested +0ms
helia:verified-fetch:trace no filename specified in query +0ms
helia:verified-fetch:trace no content disposition specified +0ms
This has to do with https://www.npmjs.com/package/@sgtpooki/file-type (fork of https://www.npmjs.com/package/file-type) not handling content-type for text.
if it's json, we will need to read the entire contents to determine if it's json.. we do allow passing a custom content-type parser to helia-verified-fetch, so we could parse the entire content to determine if it's json.
Here are logs for the request of bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce (image):
helia:verified-fetch:dag-pb-plugin got async iterator for bafybeiblgxritjybxffn5te6aupebq7m77xm7xiqqkiq7eete7wrblhnce/ +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set _fileSize to null +1ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace requestRangeStart and requestRangeEnd are null +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace set request body with fileSize null +0ms
helia:verified-fetch:helia:verified-fetch:byte-range-context:trace returning body unmodified for non-range, or invalid range, request +0ms
content-type-parser contentTypeParser called for fileName: undefined, byte size=262144 +8s
content-type-parser detectedType: image/png +0ms
helia:verified-fetch:dag-pb-plugin:trace contentTypeParser returned image/png +1ms
helia:verified-fetch:dag-pb-plugin:trace setting content type to "image/png" +0ms
helia:verified-fetch:trace checking for content disposition +1ms
helia:verified-fetch:trace download not requested +0ms
helia:verified-fetch:trace no filename specified in query +0ms
helia:verified-fetch:trace no content disposition specified +0ms
Ok I've got "File -> save page as" to result in .json download, but right clicking the page and save as still results in download.html with a local sw instance when running https://github.com/ipfs/service-worker-gateway/pull/603
FYI i've deployed this to inbrowser.dev: https://github.com/ipfs/service-worker-gateway/actions/runs/13594225197/job/38007334529
Changes from #603 are now at https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/. not sure what is up with the right click and save but you can see that it renders that json correctly now
.html looks like a bug or behavior specific to Service Workers in Chromium.
Trying to save https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/ in Firefox produces a proper .json
Things to try to fix Chromium:
- It is suspicious that
x-content-type-options: nosniffis returned byinbrowser.dev(we see issues) and notdweb.link(no issues). I'll Removex-content-type-options: nosnifffrom initial response atinbrowser.devto ensure header parity withdweb.link(opened https://github.com/ipshipyard/waterworks-infra/pull/516 but we will land it next week) - Always return explicit
Content-Dispositionwith expected file extension. - If above fail, we could create small HTML+JS repro that we can fill as a bug in Chromium and ask Igalia for help in fixing it.
Even strange behavior in chrome when the files are wrapped in a directory with filename and extension.
json: filename and extension is correct. https://bafybeigxip3b5uegqlmcodhitzogr7w46uyedsai4kecrxlu4w74szyfxi.ipfs.inbrowser.dev/metadata.json
Images: filename is correct, but extension is wrong. https://bafybeifiugn3zaj67dh6zqlgqpnzkbykbec3iur6wpb2zaz44ekcazyew4.ipfs.inbrowser.dev/image.jpeg
Everything is fine in Firefox.
Tested in Chromium and removal of x-content-type-options: nosniff does not seem to make any difference.
JSON has special-handling UI in Chromium and also on our end, no not a surprise it happens to "work fine". Other files likely hit default code path.
Chrome's "Save As" (Ctrl+S) feature doesn't just save the resource served at the URL (/image.jpg). Instead, it saves the entire document context of the current tab. If /image.jpg is being served in a way that’s tied to an HTML page (e.g., loaded via an <img> tag or a redirect from an HTML document), Chrome might be saving that originating HTML page instead of the raw image file.
The fact that the saved index.html includes the HTML+JS that initialized the Service Worker strongly suggests that the browser is saving the page that registered the Service Worker, not the raw response for /image.jpg
@SgtPooki some ideas to try next:
-
review Service Worker code for any logic that might return
text/htmlHTML response as a fallback- we should never default to
text/htmlblindly - I am suspicious this
defaultMimeType = 'text/html'hack could be related - it looks like temporary hack to make things work – could we remove it and replace with
application/octect-stream+ proper content type sniffing?
- we should never default to
-
if (1) does not help, make SW GW set explicit
Content-Disposition: inline; filename="image.jpg"matching file name from path (and if there is no path, set it tofile.extwhere.extis returned byfile-typesniffer).
if (1) does not help, make SW GW set explicit Content-Disposition: inline; filename="image.jpg" matching file name from path (and if there is no path, set it to file.ext where .ext is returned by file-type sniffer).
I tried this locally and it doesn't help:
+ response.headers.set('Vary', 'Accept-Encoding')
+ const contentType = response.headers.get('Content-Type')
+ let filename = 'download.bin'
+ if (contentType != null) {
+ switch (contentType) {
+ case 'application/json':
+ filename = 'download.json'
+ break
+ default:
+ break
+ }
+ }
+ response.headers.set('Content-Disposition', `inline; filename="${filename}"`)
FYI, this seems to be a bug with chrome. see https://sgtpooki.github.io/chrome-sw-filetype-bug-repro/
source is at https://github.com/SgtPooki/chrome-sw-filetype-bug-repro
chromium bug filed at https://issues.chromium.org/u/1/issues/400455011
Quick update: our friends at Igalia will look into https://issues.chromium.org/issues/400455011 (and https://issues.chromium.org/issues/40410035) once the ongoing work on https://issues.chromium.org/issues/40069954 is done.
The JSON extension still isn't working properly in Firefox and Chrome. It's weird, but Firefox recognises it as JSON and can save it as such. But when I open the json file, it's got HTML code in it. Here's an example: 1st: CID without filename https://bafkreieze572daxva52asutm4u2bgtjtifur2a4d2aiur7lymjcwhoeqyy.ipfs.inbrowser.dev/
2nd: CID with filename https://bafybeigxip3b5uegqlmcodhitzogr7w46uyedsai4kecrxlu4w74szyfxi.ipfs.inbrowser.dev/metadata.json
the bug is still in chrome.
However, a potential workaround for now might be the content-rendering screen at https://github.com/ipfs/service-worker-gateway/pull/751.. let me know how you feel about that..
Note that the original purpose of that PR is now resolved separately by https://github.com/ipfs/service-worker-gateway/pull/753
https://github.com/ipfs/service-worker-gateway/issues/574#issuecomment-2795016251 will not be fixed any time soon (being realistic about devgrants, funding, and how long it takes for fix to land in Chrome stable: it for sure wont be fixed in next 6-12 months).
This means we need to fix it in userland. My initial idea is to return custom HTML UI for files that browser can render (images, PDFs, videos), with explicit "Download" button. This way user does not use "Save As" from browser, and instead uses the UI provided by us.
https://github.com/ipfs/service-worker-gateway/issues/574#issuecomment-2795016251 will not be fixed any time soon (being realistic about devgrants, funding, and how long it takes for fix to land in Chrome stable: it for sure wont be fixed in next 6-12 months).
This means we need to fix it in userland. My initial idea is to return custom HTML UI for files that browser can render (images, PDFs, videos), with explicit "Download" button. This way user does not use "Save As" from browser, and instead uses the UI provided by us.
I did have a download Ui, i think i closed that PR, but you could reuse some of that if necessary