pywb icon indicating copy to clipboard operation
pywb copied to clipboard

indexer should report which WARC file causes an error

Open anarcat opened this issue 7 years ago • 0 comments
trafficstars

Is your feature request related to a problem? Please describe.

There are a few cases where the indexer cannot correctly create a CDX file from a WARC file. There are, for example, #44 and #168, reported here, which have valid workarounds.

The problem I have here is that I would very much like to fix the problem, but it occurred only after indexing many WARC files. I added about two dozen of those to the collection, and now it's giving this error, without any more information:

Invalid WARC record, first line:

Describe the solution you'd like

The indexer should catch that error and report which file triggered it so it can be fixed correctly.

Describe alternatives you've considered

I've considered creating a new collection and running wb-manager add on each WARC file one by one so I could tell which one is triggering the problem. But add is designed to support adding multiple files at once, so it should also report errors accordingly.

Additional context

This is part many issues found when using pywb with larger collections, see #408 and #410.

anarcat avatar Nov 13 '18 14:11 anarcat