[BUG] Network import treats incomplete partial content as success

Open bbappserver opened this issue 3 years ago • 1 comments

Hydrus version

v499

Operating system

Linux (specify distro and version in comments)

Install method

Running from source

Install and OS comments

No response

Bug description and reproduction

I encountered this when Twitter's video API was being temporarily flaky.

symptoms Some files had metadata information like "5 fps, 41 seconds", but had only 97kiB of content and just played a brief 0.2s clip over and over in the preview.

finding the source Checked that it was not a playback issue by opening the source url and the file url in a web browser, source file was fine, forcing re-download corrected the issue for those files, but obviously with the size of hydrus collections being what they are, I was just lucky to catch it, and it would be really bad if this happened in several files.

Bug is transient and so difficult to reproduce. It may be reproducible by making a pretend HTTP server that serves a long file as partial chunks and forcing it to die in the middle.

proposed solution Check twisted more carefully to ensure that the length of the delivered bytes matches the Content-Length

Log output

Some faulty video files show as unrecognized format in the download log, but still succeed in download and show as working, but play back normally, except that they are too short as described above

Sep 11 '22 07:09 bbappserver

I am noting here that we discussed this the other day and couldn't easily reproduce the issue. We looked at Content-Length and Content-Range parsing in depth and couldn't see an obvious problem. For 502 I have updated the Content-Range related sanity checks, and now the client will print to log whenever it gets borked-short 206 responses.

I looked at all the error states and feel like I have things covered in all the situations I know about where a response could be truncated. requests is supposed to catch a 'Transfer-Encoding: chunked' response with a special Exception, even.

I think we need to figure out a twitter (or other) vid that reliably (maybe once in every five attempts) produces a truncated file so we can examine and debug exactly what is happening, but my suspicion is that this is a backend issue and these sites are just sometimes delivering truncated files in confidence with 'correct' header info. As we discussed, the best work I can probably do next is to implement a 'this file is borked, redownload it and replace if the content is longer' file maintenance job.

We might be able to automatically detect these files by searching for crazy low bitrates and/or 'ffmpeg can only render the first 12 frames of this supposedly 96 frame vid'.

Oct 12 '22 02:10 hydrusnetwork