Andy Jackson
Andy Jackson
@ibnesayeed I think the issue is the warcbase fatjar is bundling it's own version of Scala (2.10) and the system is borked by having two different Scala version on the...
@ibnesayeed however the JAR is loaded you'll get the same error because the warcbase JAR is not compatible with the version Scala spark-notebook is using. `EXTRA_CLASSPATH` works fine for me...
I found this potentially related issue: https://github.com/digital-preservation/droid/issues/71 Are you running the latest version of DROID?
Hm, the WARC records look alright to me (see below). We do have some crufty records from accidentally crawler our own archive in the past, but we don't seem to...
A-ha, I think this arises because there's a `closest_limit` of 10 that's used when looking up the URL in OutbackCDX. PyWB appends `&limit=10&matchType=exact` to the query and that fails if...
Hm, something weird is going on. I've deployed our latest PyWB on our BETA service, and made it filter out revisits, leading to this calendar: https://beta.webarchive.org.uk/wayback/archive/*/http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx# The ones prior to...
To be clear, limiting `closest_limit` to a hardcoded value of 10 is definately a problem and is causing various playback issues. It may not be the only problem. https://github.com/webrecorder/pywb/blob/54d8bccf4a4eebf305012d49cb7330eaddea9eba/pywb/warcserver/index/indexsource.py#L116-L121
Following update to run under 2.5.0, this should work fine I think. Under a test server, it still says: ``` The url http://www.webarchive.org.uk/wayback/archive/20140613220103/http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx could not be found in this collection....
Unfortunately, this doesn't seem to work on live, e.g. https://www.webarchive.org.uk/act/wayback/archive/20181013061546/https://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx still says ``` The url http://www.webarchive.org.uk/wayback/archive/20140613220103/http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx could not be found in this collection. ``` EDIT: there are some suggestions the...