LAVFilters icon indicating copy to clipboard operation
LAVFilters copied to clipboard

H.264 decoder, closed GOPs

Open rjappleton opened this issue 4 years ago • 4 comments

I have my own DirectShow demuxer which does frame-accurate seeking by setting negative timestamps until the first required frame (the seeked-to frame) which gets a >=0 timestamp. This gives frame accurate seeking for most decoders OK including the LAV video decoder, because the decoder does not output anything until the first >=0 time stamp.

If I want to seek to the first frame in a Closed GOP, then I ensure that the timestamps of the leading B frames are positive, so the presentationally first leading B frame is the first frame output by the decoder. This works great most of the time with the LAV decoder.

But I've noticed a problem with some files which have closed GOPs and leading B frames. For these files, the LAV Decoder does NOT output the leading B frames even though they had positive timestamps - it just outputs the I frame first. I can understand this behaviour if the GOPs were open, but they are definitely closed.

If I set the LAV decoder's hardware accel setting to "QuickSync" then everything works fine - the leading B frames ARE output and there is no visible corruption of the frames, confirming that the GOPs are closed. Intel's Media SDK DirectShow decoder also outputs these OK. But if the LAV decoder's hardware accel is set to anything else, then the leading B frames are NOT output.

The problem files all seem to be from Canon cameras, but that might just be a coincidence. The problem files also seem to have no SEIs, so there are no recovery points defined in the file, BUT all of the I frames are IDRs so the GOPs are effectively closed.

I can work around the problem if I start the streaming from the beginning of the previous GOP (so all of the timestamps in that GOP are negative), then the leading B frames in the current GOP are output OK which is what you would expect if the GOP had been open. But that seems a bit wasteful when I should be able to stream from the current GOP.

Does the LAV decoder maybe treat GOPs as always open if they do not have SEI recovery points, even if all the I frames are IDRs?

rjappleton avatar Mar 05 '20 16:03 rjappleton

In coded bitstream order, a valid random access point cannot have "leading" B-Frames, since they cannot be decoded without context. A random access point, in coded bitstream order, should start with an I frame.

Nevcairiel avatar Mar 05 '20 16:03 Nevcairiel

Sorry, I wasn't clear. By "leading B frame" I meant in presentation/decoded order.

So in encoded order:

IBBPBBP.... 1234567... ^Start streaming from here

so frames 2 and 3 above are "Leading B frames" in presentation/decoded order thus

BBIBBP... 231564...

It is frames 2 and 3 that are not output by LAV decoder for the problem files (all frames have positive timestamps).

rjappleton avatar Mar 05 '20 17:03 rjappleton

As far as I can see, as long as those I frames are really IDRs, there is no reason it wouldn't output. Only without an IDR would it want a recovery point.

Could you cut a small sequence at right such a point to allow demonstration?

Nevcairiel avatar Mar 05 '20 17:03 Nevcairiel

They are MP4 files from the camera(s). Here is one that has quite a lot of movement, so it's easier to identify each decoded frame uniquely.

https://www.dropbox.com/s/17p0d2afske3fnm/Canon%20MVI_0038.MP4?dl=0

Choose an I frame and start streaming from there. Ensure that all of the frames (directshow samples) in the GOP have positive timestamps.

Do not begin streaming from the previous GOP because that is the workaround that I mentioned.

P.S. And here are some screenshots. They show the frames in decoded order B1, B2, I3. When Quicksync is selected as the hardware decoder then we see correctly the 3 different consecutive frames. But when DXVA-native is selected we only see the I frame because the B frames are not output by the decoder (so the I frame is the first frame to emerge from the decoder).

https://www.dropbox.com/s/hb82o4vzhbnmo1o/screenshots.zip?dl=0

So just to clarify, they are the first 3 frames of a GOP which, in ENcoded order is

I3 B1 B2

and in DEcoded order is

B1 B2 I3

rjappleton avatar Mar 05 '20 21:03 rjappleton