openwayback icon indicating copy to clipboard operation
openwayback copied to clipboard

The OpenWayback Development

Results 101 openwayback issues
Sort by recently updated
recently updated
newest added

JavaScript injection is used to display the existing OpenWayback header that allows navigation in time and provides information about the current resource. This only works for HTML pages and even...

enhancement

We've just noticed a few `timeTrunc` errors in our crawl logs and the resulting `WARC-Truncated: time` headers in our WARC records. Does OpenWayback handle these specifically? At the moment it...

such as: - maintaining the form of URL (absolute, protocol-relative, server-relative, path-relative) - to make it less likely to break when manipulated by JavaScript. - changing protocol depending on `X-Forwarded-Proto`...

Here is how we currently recommend researchers to reference a given web page in the danish webarchive, fx. (URL + provenance): http://netarkivet.dk 197800-188-20140107085943-00000-sb-prod-har-005.statsbiblioteket.dk.warc/4773261 (9:01:06 jan 7, 2014 in UTC time)....

We're using the `FlexResourceStore` and `ZipNumBlockLoader` in our `CDXCollection.xml`. If the `numRetries` in `ZipNumBlockLoader` is set to anything > 1, then you can see errors like this in the log:...

The de facto standard for serving web archive indexes seems to be via CDX, not BDB. However, Wayback defaults to BDB out-of-the-box and our current [documentation](https://github.com/iipc/openwayback/wiki/How-to-configure) recommends commenting out/un-commenting the...

enhancement

When crawling using Heritrix, if both `sendIfModifiedSince` and `writeRevisitForNotModified` are set to `true` (although the latter has been deprecated, presumably equivalent to always being `true`), a server may respond with...

enhancement

Following on from the merging of #189, the [ResourceFactory](https://github.com/iipc/openwayback/blob/master/wayback-core/src/main/java/org/archive/wayback/resourcestore/resourcefile/ResourceFactory.java) could use an overhaul. Currently the code is a series of `if` statements: ``` if(urlOrPath.startsWith("http://")) ... } else if(urlOrPath.startsWith("hdfs://") || urlOrPath.startsWith("s3://")...

enhancement

Scale: # hits on host, aggregate and clustering results. **S'sheet line:** 2 **For whom?** BNF, BL, DN, IA **Notes:** New CDX server should enable this. **Est. Milestone:** 2.x.x

enhancement

It would be interesting if the wayback machine provided ways to do visual diffs on webpages via [resemble.js](http://huddle.github.io/Resemble.js/) and perhaps [html diffs](https://github.com/christian-oudard/htmltreediff) as well.