warc2zim icon indicating copy to clipboard operation
warc2zim copied to clipboard

Can we do something when server return bad content-type

Open benoit74 opened this issue 7 months ago • 0 comments

Sample: https://wikileaks.org/pdfjs/web/viewer.html?file=%2F..%2Fspyfiles%2Ffiles%2F0%2F289_GAMMA-201110-FinSpy.pdf

### REC Headers ###
WARC/1.1
WARC-Page-ID: 84e46546-1a5f-4958-a844-f268cf8387b3
WARC-Resource-Type: document
WARC-JSON-Metadata: {"ipType":"Public","cert":{"issuer":"R11","ctc":"1"}}
WARC-Target-URI: https://wikileaks.org/pdfjs/web/viewer.html?file=%2F..%2Fspyfiles%2Ffiles%2F0%2F289_GAMMA-201110-FinSpy.pdf
WARC-Date: 2024-07-04T01:46:28.249Z
WARC-Type: response
WARC-Record-ID: <urn:uuid:5e28af04-8276-4780-b796-9296c9b34bc7>
Content-Type: application/http; msgtype=response
WARC-Payload-Digest: sha256:ceacb542dc6931476349b523ae9017c66a257a6daf43d810f547223ba1adfc8c
WARC-Block-Digest: sha256:d02f30e007686ab8276dae62e3eb9a40ac1baa934c033c33838f35678a30b597
Content-Length: 19864

### HTTP Headers ###
HTTP/1.1 200 OK
age: 0
cache-control: public, max-age=1200
content-type: text/html; charset=utf-8
date: Thu, 04 Jul 2024 01:46:28 GMT
server: nginx
surrogate-control: ESI/1.0
vary: Accept-Encoding, Accept-Encoding
via: 1.1 varnish (Varnish/7.1)
x-varnish: 9909606
x-orig-content-encoding: gzip

Here the server says that this is a text/html document while this is a PDF ...

benoit74 avatar Jul 08 '24 09:07 benoit74