Andy Jackson
Andy Jackson
I attempted to do this here: http://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/ However, it would probably be better to see if we can work with IA to update https://archive.org/web/researcher/cdx_file_format.php
I built [this timeline](http://www.webarchive.org.uk/mementos/search/http://www.conservatives.com/News/Speeches/2010/03/David_Cameron_Our_Big_Society_plan.aspx), which is an example of the kind of thing we could do using off-the-shelf tooling.
I've come across `X-Generator`, but `X-Powered-By` is more common. However, perhaps we should use `Server` as that's standardised?
Given that that potentially involves futzing about with the header of a re-written page, I'd rather not use the generator tag. Apart from anything else, the page handed back might...
@ibnesayeed ah, fair enough. No problem with inclusion in the default UI templates.
If you are using the code from Java, you will need to catch any runtime Exceptions thrown during the iteration over the records, so that you can recover and move...
Hi @bjrne - I think this might be down to the awkward way that parameters like `WAYBACK_URL_PORT` only refer to how the service is accessed, rather than configuring how it...
Perhaps best as a separate repo and issue tracker?
I was originally interested in resolving this for ARC/WARC source files, not over HTTP. The current codebase uses the file extension, but makes different assumptions in different places (there are...
The problem is I think we should distinguish between concatenated gzip and plain gzip, and deliberately use an unfamiliar identifier so that users are aware of this distinction. Perhaps that's...