pywb
pywb copied to clipboard
'&' rewritten to '&' -> URL not found
Describe the bug
'&' gets rewritten to '&' Therefore URLs like this one cannot be found: http://localhost:8080/test/20220211121756/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc ends up in: http://localhost:8080/test/20220211121756mp_/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc
Which does not exist in the cdxj: ch,republik)/dialog?id=7cf890b9-fa66-415d-9059-70d06d2703dc&t=article 20220211121848 {"url": "https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc", "mime": "text/html", "status": "200", "digest": "3C3YSWWP7DU44C44GKRSMK2OU35ZMXMS", "length": "32240", "offset": "1387846", "filename": "republik-20220211122116.warc.gz"}
Steps to reproduce the bug
- record a page with a similar url-pattern in conifer
- download the collection
- index and play it in pywb 2.6.4
Expected behavior
No rewriting of '&' to happen, and the page to be displayed nicely
Screenshots

Environment
- OS: Winows 10
- Browser chrome 92
Additional context
pywb debug log:
127.0.0.1 - - [2022-02-11 13:28:55] "GET /test/20220211121756/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc HTTP/1.1" 200 2302 0.019023 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/wb_frame.js HTTP/1.1" 200 8630 0.003003 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/default_banner.js HTTP/1.1" 200 10217 0.000998 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/default_banner.css HTTP/1.1" 200 3946 0.007038 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/calendar.svg HTTP/1.1" 200 573 0.000998 2022-02-11 13:28:55,876: [DEBUG]: Starting new HTTP connection (1): localhost:51315 127.0.0.1 - - [2022-02-11 13:28:55] "POST /test/resource/postreq?url=https%3A%2F%2Fwww.republik.ch%2Fdialog%3Ft%3Darticle%26amp%3Bid%3D7cf890b9-fa66-415d-9059-70d06d2703dc&closest=20220211121756&matchType=exact HTTP/1.1" 404 215 0.008998 2022-02-11 13:28:55,892: [DEBUG]: http://localhost:51315 "POST /test/resource/postreq?url=https%3A%2F%2Fwww.republik.ch%2Fdialog%3Ft%3Darticle%26amp%3Bid%3D7cf890b9-fa66-415d-9059-70d06d2703dc&closest=20220211121756&matchType=exact HTTP/1.1" 404 32 127.0.0.1 - - [2022-02-11 13:28:55] "GET /test/20220211121756mp_/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc HTTP/1.1" 404 1309 0.076008 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/bootstrap.min.css HTTP/1.1" 200 153240 0.000995 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/font-awesome.min.css HTTP/1.1" 200 54739 0.008043 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/base.css HTTP/1.1" 200 99 0.014002 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/js/jquery-latest.min.js HTTP/1.1" 200 87043 0.002000 127.0.0.1 - - [2022-02-11 13:28:56] "GET /static/js/bootstrap.min.js HTTP/1.1" 200 76425 0.006032
I found that it has to do with jinja2, pywb's python/html template engine. As a result, I found a workaround from this stack overflow post: just replace {{ url }} with {{ url|safe }}.
On the documentation example, it should become:
<script src='{{ host_prefix }}/{{ static_path }}/wb_frame.js'> </script>
<script>
var cframe = new ContentFrame({"url": "{{ url|safe }}" + window.location.hash,
"prefix": "{{ wb_prefix }}",
"request_ts": "{{ wb_url.timestamp }}",
"iframe": "#replay_iframe"});
</script>
Thanks VascoRatoFCCN, it works when I modify the frame_insert.html template according to your input!! BR, Stephan