lintel icon indicating copy to clipboard operation
lintel copied to clipboard

Random seek point can overshoot

Open corneliusboehm opened this issue 6 years ago • 6 comments

Hi there. I was getting lots of "Ran out of frames. Looping." messages in my application, even though I never requested more frames than available. The following test showed that the errors were not deterministic, because even for the same video and the same number of frames the message sometimes occurred and sometimes not:

def test_ran_out_of_frames(width, height, duration, framerate):
    for idx in range(50):
        print('Run {}'.format(idx))

        with open('test_video.mp4', 'rb') as f:
            encoded_video = f.read()

        num_frames = int(duration * framerate)
        video, _ = lintel.loadvid(encoded_video,
                                  height=height, width=width, num_frames=num_frames)

Output:

Run 0
Run 1
Run 2
Ran out of frames. Looping.
Run 3
Run 4
Run 5
Run 6
Ran out of frames. Looping.
Run 7
Run 8
Run 9
.
.
.

Doing the same with should_random_seek=False solved the problem, so it very much looks like the random seek point does not truly respect the space to leave for the requested frames. Can you recreate the issue and if so take a look into this?

corneliusboehm avatar Aug 07 '18 15:08 corneliusboehm

Hi! I think respecting the number of frames in the video is a tricky business, for example due to placement of the keyframes, the seek can overshoot. But I can take a look, since I agree this is less than satisfactory. Perhaps a "strict" mode is called for.

What is the length of video you are using? I will try to reproduce with a similar length video.

dukebw avatar Aug 07 '18 15:08 dukebw

Thanks for your quick reply! The length of my video is about 10 seconds.

corneliusboehm avatar Aug 07 '18 15:08 corneliusboehm

Hello again, I was wondering if you might be able to give the seek-point-overshoot branch a try (https://github.com/dukebw/lintel/tree/seek-point-overshoot).

I changed the seeking in loadvid to work based on AVStream.nb_frames, rather than some poor approximations in seconds. This is more similar to how the loadvid_frame_nums seeking works now.

I have still found at least one video where AVStream.nb_frames is wrong, in this case AVStream.nb_frames reported 168, but using receive_frame() I could only decode 166 frames. So the method will still overshoot by however much AVStream.nb_frames overestimates the number of frames in the video. If this extremely precise accuracy is required, I wonder if it is better to preprocess all the videos by counting frames with receive_frame(), store metadata about how many frames can really be decoded, then use loadvid_frame_nums to only get frames within those bounds. I'm also not sure if the estimate here: https://github.com/dukebw/lintel/blob/5af35bd3a80b9012b43801142235d301122d2cb0/lintel/py_ext/lintelmodule.c#L134-L186 maybe already accounts for when AVStream.nb_frames from the container is wrong, and we should just use this estimate always.

Anyway, please let me know what you think, and if you get a chance to try the fix. Thank you for pointing out the bug!

dukebw avatar Aug 12 '18 15:08 dukebw

Thanks for your work! I gave the branch a try, but as you have already noticed there are some problems concerning nb_frames. For the videos in my database I need to reduce nb_frames reported by ffprobe by exactly 3 to get the number of frames that can actually be decoded. That is also why I am not able to check, if the modifications improved the random seek point placement.

Do you think it might be possible that the nb_frames offset somehow comes from lintel itself? I find it hard to imagine that so many of our videos have wrong meta information. The reported nb_frames also matches duration*avg_frame_rate, so the information seems to be consistent.

My preferred solution to all of this would be to allow an argument such as -1 for num_frames. Lintel could then return just as many frames as it is able to decode. The downside here of course is that the output stream can not easily be preallocated. This would require the preprocessing that you mentioned.

Side note: Thanks for removing fps_cap. I was disabling it so far by setting it to an extremely high number. If you want to reintroduce it, I would similarly propose to add the option -1 for disabling.

corneliusboehm avatar Aug 13 '18 09:08 corneliusboehm

Okay great, thank you for the feedback about both APIs, and I agree those would be improvements, at least it would be good to have some interface to allow reporting of how many frames were decoded successfully when decoding fails.

I will think about how to incorporate a solution that matches the frame count reported by ffprobe. I'm pretty sure that receive_frame (https://github.com/dukebw/lintel/blob/456d211e8a9a91aae0cd33801e85622996289cf2/lintel/core/video_decode.c#L51) is correctly using the send/receive packet API, so I will have to dig into the ffprobe code to see what the heck it is doing to count frames, and this may take some time.

dukebw avatar Aug 13 '18 13:08 dukebw

Hi again! It appears you were right, and there was a bug in receive_frame. I was neglecting to "drain" the codec, as described here:

https://github.com/FFmpeg/FFmpeg/blob/fe06ed22e6e0a8c2995818c4532eb6f4ec9320b9/libavcodec/avcodec.h#L122-L133

I was wondering if you might be able to give commit ca3e1dedd3c3aeba9263e3ea7b33f45263ca34cb a try.

dukebw avatar Aug 16 '18 23:08 dukebw