httrack2warc icon indicating copy to clipboard operation
httrack2warc copied to clipboard

Handle image errors renamed to .html

Open ato opened this issue 7 years ago • 0 comments

Requests for URLs with an image file extension (e.g. foo.gif) might return a HTML 404 error message. In this case HTTrack appears to write the error message to a file named foo.html but still refers to it as foo.gif in the cache and in new.txt.

I've worked around this for now by allowing the skipping of missing files if they would have an HTTP error status code. Is there a way we can detect and handle this case properly? Maybe we can implement the same conditions HTTrack has for renaming the files and probe for their existence.

ato avatar Feb 02 '18 07:02 ato