pywb icon indicating copy to clipboard operation
pywb copied to clipboard

'&' rewritten to '&' -> URL not found

Open steph-nb opened this issue 3 years ago • 3 comments

Describe the bug

'&' gets rewritten to '&' Therefore URLs like this one cannot be found: http://localhost:8080/test/20220211121756/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc ends up in: http://localhost:8080/test/20220211121756mp_/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc

Which does not exist in the cdxj: ch,republik)/dialog?id=7cf890b9-fa66-415d-9059-70d06d2703dc&t=article 20220211121848 {"url": "https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc", "mime": "text/html", "status": "200", "digest": "3C3YSWWP7DU44C44GKRSMK2OU35ZMXMS", "length": "32240", "offset": "1387846", "filename": "republik-20220211122116.warc.gz"}

Steps to reproduce the bug

  • record a page with a similar url-pattern in conifer
  • download the collection
  • index and play it in pywb 2.6.4

Expected behavior

No rewriting of '&' to happen, and the page to be displayed nicely

Screenshots

image

Environment

  • OS: Winows 10
  • Browser chrome 92

Additional context

pywb debug log:

127.0.0.1 - - [2022-02-11 13:28:55] "GET /test/20220211121756/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc HTTP/1.1" 200 2302 0.019023 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/wb_frame.js HTTP/1.1" 200 8630 0.003003 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/default_banner.js HTTP/1.1" 200 10217 0.000998 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/default_banner.css HTTP/1.1" 200 3946 0.007038 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/calendar.svg HTTP/1.1" 200 573 0.000998 2022-02-11 13:28:55,876: [DEBUG]: Starting new HTTP connection (1): localhost:51315 127.0.0.1 - - [2022-02-11 13:28:55] "POST /test/resource/postreq?url=https%3A%2F%2Fwww.republik.ch%2Fdialog%3Ft%3Darticle%26amp%3Bid%3D7cf890b9-fa66-415d-9059-70d06d2703dc&closest=20220211121756&matchType=exact HTTP/1.1" 404 215 0.008998 2022-02-11 13:28:55,892: [DEBUG]: http://localhost:51315 "POST /test/resource/postreq?url=https%3A%2F%2Fwww.republik.ch%2Fdialog%3Ft%3Darticle%26amp%3Bid%3D7cf890b9-fa66-415d-9059-70d06d2703dc&closest=20220211121756&matchType=exact HTTP/1.1" 404 32 127.0.0.1 - - [2022-02-11 13:28:55] "GET /test/20220211121756mp_/https://www.republik.ch/dialog?t=article&id=7cf890b9-fa66-415d-9059-70d06d2703dc HTTP/1.1" 404 1309 0.076008 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/bootstrap.min.css HTTP/1.1" 200 153240 0.000995 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/font-awesome.min.css HTTP/1.1" 200 54739 0.008043 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/css/base.css HTTP/1.1" 200 99 0.014002 127.0.0.1 - - [2022-02-11 13:28:55] "GET /static/js/jquery-latest.min.js HTTP/1.1" 200 87043 0.002000 127.0.0.1 - - [2022-02-11 13:28:56] "GET /static/js/bootstrap.min.js HTTP/1.1" 200 76425 0.006032

steph-nb avatar Feb 11 '22 12:02 steph-nb

I found that it has to do with jinja2, pywb's python/html template engine. As a result, I found a workaround from this stack overflow post: just replace {{ url }} with {{ url|safe }}.

On the documentation example, it should become:

<script src='{{ host_prefix }}/{{ static_path }}/wb_frame.js'> </script>
<script>
var cframe = new ContentFrame({"url": "{{ url|safe }}" + window.location.hash,
                               "prefix": "{{ wb_prefix }}",
                               "request_ts": "{{ wb_url.timestamp }}",
                               "iframe": "#replay_iframe"});
</script>

VascoRatoFCCN avatar Jun 15 '22 13:06 VascoRatoFCCN

Thanks VascoRatoFCCN, it works when I modify the frame_insert.html template according to your input!! BR, Stephan

steph-nb avatar Jun 15 '22 13:06 steph-nb