pywb
pywb copied to clipboard
pywb does return response with content-length 0 when a POST-request is concurrent to a revisit
Describe the bug
pywb seems not be able to redirect to a valid response, when a POST-request is concurrent to a revisit (which in turn has the same payload-digest of a valid response).
The request looks like this:
WARC/1.1 WARC-Record-ID: urn:uuid:63a4032b-448e-5c8d-acfe-ad470090befe WARC-Page-ID: xiab8o4ssvmlfn4u3gn9rc WARC-Concurrent-To: urn:uuid:f236a3c0-a9db-5d1d-bc85-fc7327e715c7 WARC-Target-URI: https://www.defacto.expert/?lang=it WARC-Date: 2022-09-07T09:03:57.290Z WARC-Type: request Content-Type: application/http; msgtype=request WARC-Payload-Digest: sha256:8e541d80e92cb354bcabf19d65963cbae82c9e471ffb32fdbc1c988b0c0ee626 WARC-Block-Digest: sha256:015a7c27a7c4e82314be29fea3cb45248ada65dff34186edd7b33a6186a402f9 Content-Length: 1075
POST /?lang=it HTTP/1.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate, br Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7 Cache-Control: no-cache Connection: keep-alive Content-Length: 22 Content-Type: application/x-www-form-urlencoded Cookie: ppwp_wp_session=058f0b7cd415a588382591fa6d27507a%7C%7C1662543223%7C%7C1662542863; _pk_id.6.ade8=430adeab68ad3acd.1662541426.; _pk_ses.6.ade8=1; wp-wpml_current_language=fr Host: www.defacto.expert Origin: https://www.defacto.expert Pragma: no-cache Referer: https://www.defacto.expert/ Sec-Fetch-Dest: document Sec-Fetch-Mode: navigate Sec-Fetch-Site: same-origin Sec-Fetch-User: ?1 Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105" sec-ch-ua-mobile: ?0 sec-ch-ua-platform: "Linux"
switchMonolingual=true
The revisit like this:
WARC/1.1 WARC-Record-ID: urn:uuid:f236a3c0-a9db-5d1d-bc85-fc7327e715c7 WARC-Page-ID: xiab8o4ssvmlfn4u3gn9rc WARC-JSON-Metadata: {"cert":{"issuer":"R3","ctc":"1"}} WARC-Payload-Digest: sha256:010ffd782e980c840c4035235a2d1799b23b0984b6506594271dafc2f6b470c5 WARC-Target-URI: https://www.defacto.expert/?lang=it WARC-Date: 2022-09-07T09:03:57.290Z WARC-Type: revisit WARC-Profile: http://netpreserve.org/warc/1.1/revisit/identical-payload-digest WARC-Refers-To-Target-URI: https://www.defacto.expert/?lang=it WARC-Refers-To-Date: 2022-09-07T09:07:52.282Z Content-Type: application/http; msgtype=response Content-Length: 566 WARC-Block-Digest: sha1:JHCVXT4PCQ3ROX57LS34AVSCPNKSPNKZ
HTTP/1.1 200 OK Cache-Control: max-age=0 Connection: Keep-Alive Content-Length: 0 Content-Type: text/html; charset=UTF-8 Date: Wed, 07 Sep 2022 09:03:55 GMT Expires: Wed, 07 Sep 2022 09:03:55 GMT Keep-Alive: timeout=5, max=47 Link: https://www.defacto.expert/wp-json/; rel="https://api.w.org/" Server: Apache Vary: User-Agent X-Frame-Options: SAMEORIGIN x-wabac-preset-cookie: ppwp_wp_session=058f0b7cd415a588382591fa6d27507a%7C%7C1662543223%7C%7C1662542863; _pk_id.6.ade8=430adeab68ad3acd.1662541426.; _pk_ses.6.ade8=1; wp-wpml_current_language=fr
And there is a response with the same payload-digest (the body is not shown here):
WARC/1.1 WARC-Record-ID: urn:uuid:fac98200-c524-5103-9d74-fe7169de643c WARC-Page-ID: mpricab1kdqv9g026hc22o WARC-JSON-Metadata: {"cert":{"issuer":"R3","ctc":"1"},"pixelRatio":1} WARC-Target-URI: https://www.defacto.expert/?lang=it WARC-Date: 2022-09-07T09:07:52.282Z WARC-Type: response Content-Type: application/http; msgtype=response WARC-Payload-Digest: sha256:010ffd782e980c840c4035235a2d1799b23b0984b6506594271dafc2f6b470c5 WARC-Block-Digest: sha256:9c1b8c87b243b72f965a9e4c63792ee3f54d40342b87089131d04962a9679093 Content-Length: 50500
HTTP/1.1 200 OK Cache-Control: max-age=0 Connection: Keep-Alive Content-Length: 49930 Content-Type: text/html; charset=UTF-8 Date: Wed, 07 Sep 2022 09:07:50 GMT Expires: Wed, 07 Sep 2022 09:07:50 GMT Keep-Alive: timeout=5, max=88 Link: https://www.defacto.expert/wp-json/; rel="https://api.w.org/" Server: Apache Vary: User-Agent X-Frame-Options: SAMEORIGIN x-wabac-preset-cookie: ppwp_wp_session=058f0b7cd415a588382591fa6d27507a%7C%7C1662543223%7C%7C1662542863; _pk_id.6.ade8=430adeab68ad3acd.1662541426.; _pk_ses.6.ade8=1; wp-wpml_current_language=it
Steps to reproduce the bug
Record https://www.defacto.expert/ with ArchiveWeb.page, click in the menu on different languages (fr or it). Save the warc. Index and load it in pywb and click again on the same language.
You will get an empty response with 200 status:
Request URL: http://localhost:8080/orig/20220907090344mp_/https://www.defacto.expert/?lang=it Request Method: POST Status Code: 200 OK Remote Address: 127.0.0.1:8080 Referrer Policy: strict-origin-when-cross-origin Content-Length: 0 Content-Location: http://localhost:8080/orig/20220907090357mp_/https://www.defacto.expert/?lang=it Content-Security-Policy: default-src 'unsafe-eval' 'unsafe-inline' 'self' data: blob: mediastream: ws: wss: ; form-action 'self' ...
Expected behavior
The WARC-response with the same payload-digest should be served by pywb
Environment
pywb 2.6.8