openwayback icon indicating copy to clipboard operation
openwayback copied to clipboard

Handling links with version query strings

Open Axenu opened this issue 6 years ago • 4 comments

We have a html file with the following link:

<script type="text/javascript" src="app/main-config.js?v=1.17.0"></script>

When the html file loads in OpenWayback the javascript file is not found and the page does not render since it needs the file.

Trying to manually load the same file http://192.168.10.210:8080/wayback/20191021115240/https://domain.com/app/main-config.js?v=1.17.0 fails as well. So it is confirmed that the file cannot be loaded. The problem is however that the file:

http://192.168.10.210:8080/wayback/20191021115240/https://domain.com/app/main-config.js

can be loaded, so it does exist.

When verifying the contents of the WARC files we can see that the file is named main-config.js?v=1.17.0

The expected behaviour is that openwayback let me open the file with the query string as it exists like that in the .WARC.

version: Apache Tomcat/8.5.20 OpenWayback: 2.3.2

Axenu avatar Oct 22 '19 11:10 Axenu

How are you indexing the WARCs?

anjackson avatar Oct 22 '19 14:10 anjackson

We are using the default configuration that comes with OpenWayback uses a Berkeley DB (BDB) database to store information about where to find WARC files and an index of their content.

Axenu avatar Nov 25 '19 09:11 Axenu

Hi @Axenu I am not sure I follow the logic in the statement:

The problem is however that the file: http://192.168.10.210:8080/wayback/20191021115240/https://domain.com/app/main-config.js can be loaded, so it does exist.

I believe https://domain.com/app/main-config.js and https://domain.com/app/main-config.js?v=1.17.0 are treated as completely separate URIs in OpenWayback and the index even though to us it looks like the same URL with a different query string.

Generally, loading URLs with version query strings should work in OpenWayback. Are you able to share the WARC file that includes https://domain.com/app/main-config.js?v=1.17.0?

ldko avatar Nov 25 '19 17:11 ldko

@ldko Sorry for not being so clear.

What I mean is that a file named main-config.js exists in the warc. The file that the browser is looking for and thereby the file that OWM is looking for is main-config.js?v=1.17.0 which does not exist in the warc.

What we would want to happen is that OWM serves the file main-config.js for requests of main-config.js?v=1.17.0, ignoring the query param.

The warc containing the file main-config.js fro clarity: https://drive.google.com/open?id=1isYQpszliKxRorPTgVuGzepBKr8XLU0V

Axenu avatar Dec 04 '19 08:12 Axenu