LAVFilters icon indicating copy to clipboard operation
LAVFilters copied to clipboard

H.264 decoder output for first GOP problem

Open rjappleton opened this issue 6 years ago • 13 comments

I have my own DirectShow demuxer which does frame-accurate seeking by setting negative timestamps until the first required frame (the seeked-to frame) which gets a >=0 timestamp. This gives frame accurate seeking for most decoders OK including the LAV video decoder, because the decoder does not output anything until the first >=0 time stamp.

But then I noticed strange behavior of the LAV Video Decoder when seeking within the first GOP of some AVCHD (H.264) .mts files.

When the problem occurs, it seems that the first I frame is always output no matter which frame in the first GOP you seek to - so the renderer always displays that frame. As soon as you seek into the second GOP and beyond, the correct seeked-to frame is displayed so seeking is then working OK.

After more investigation, if QuickSync is selected as the Hardware acceleration mode of the LAV decoder then the problem never occurs. But for other modes like None, DXVA copyback etc the problem does occur for some AVCHD .mts files.

I then looked at the output timestamps when streaming from the START of the file. When QuickSync is selected I got the clean sequence 333555, 667222, 1000888, 1334555. But when selecting DXVA copyback I got 0, 333666, 0, 333666, 667332, 1000998, 1334664. So something is not quite right with non-Quicksync modes.

I have tried to isolate what distinguishes the mts files that showed this problem compared to those that do not, but without much success - except that the the problem files seem to be from some Sony video cameras.

In MPEG-2, you often have the first GOP closed and all other GOPS open. So I wondered if there was something similar in the first H.264 GOP that wasn't being handled well for non-Quicksync modes.

So I used a hex-editor to remove the first GOP from a problem mts file, so that the second GOP was now the first GOP in the file. I found that seeking and streaming now seemed to work OK for this hacked file in its first GOP.

I conclude that there is something different in the first GOP of the problem files (maybe in the SPS or an SEI) that is messing up the LAV decoder in non-QuickSync modes.

rjappleton avatar May 14 '18 16:05 rjappleton

As long as the source always sends timestamps, those should be passed through as-is, and no new timestamps being invented. So perhaps those first frames dont actually have timestamps, in which case LAV Video would never know to drop them?

Nevcairiel avatar May 14 '18 17:05 Nevcairiel

Re the output timestamps listed above, the input timestamps are

0 -667333 -333666 667333 333666 1334666 1001000 2002000 1668333 2669333 2335666 3336666 3003000 4004000 3670333 and so on...

So they do all have input timestamps (the two negative timestamps are the leading B frames which is OK). The output timestamps are often slightly different to the input timestamps - I'm not really bothered by that as I've seen it a lot with other decoders.

I've checked the SPS of the first and second GOPs and they are identical, so the difference does not seem to be in the SPS. But there are differences in the PPS and SEI's so maybe the source of the problem is in there but I've no way to parse them in detail.

The Quicksync module in LAV Video decoder must be parsing this differently because it emits the correct series of frames and timestamps - it works great treating negative timestamps as preroll and does not output the corresponding decoded frames. It is the other non-Quicksync modes that exhibit the problem (in the first GOP).

rjappleton avatar May 14 '18 22:05 rjappleton

The Quicksync decoder has a time stamp "correction" feature. So the actual timestamps are not necessarily correct.

Upload a sample file for Nevcairiel. That is the best way to get an analysis.

clsid2 avatar May 15 '18 11:05 clsid2

Indeed, a way to actually test this would be the only way to actually answer any questions.

Nevcairiel avatar May 15 '18 11:05 Nevcairiel

No problem, here is an example.

sample.MTS is the original file that demonstrates the problem in the first GOP sample-chopped.MTS is the same file but with the first GOP chopped off (the second GOP is now the first GOP) and seeks OK in the now first GOP for all hardware modes of LAV video decoder

rjappleton avatar May 16 '18 15:05 rjappleton

I can't reproduce any problems. I offset the timestamps by two frames as your example to simulate seeking, and the output timestamps come out perfectly.

The only remaining piece would be your source filter sending data differently, somehow.

Nevcairiel avatar May 17 '18 13:05 Nevcairiel

I've looked more closely at this and summarized my findings in this spreadsheet for the sample.MTS file.

Using LAV Filters version 0.71

Timestamps are just part of the story - identifying the decoder's output frame by its actual visual content is important in case the decoder has changed the timestamp compared to the input sample's timestamp. I stored each decoder output frame into an AVI file that I could then step through using VirtualDub to see each actual output frame. In the speadsheet each is numbered in presentation order starting from 1 being the I-frame.

All samples sent from the demux are complete (encoded) frames and all have a start timestamp and end timestamp. The end timestamp is always start timestamp + 1. The first sample has Discontinuity flag set.

The output from the decoder is NV12.

I've logged the timestamp on each sample being input to the decoder from the demux, and the timestamp of each decoded sample being output from the decoder.

In Column A I've listed the size in bytes of each sample as it is sent by the demux.

I have tried streaming from the first I-frame (timestamp 0) using the "none" hardware setting and the "QuickSync" hardware setting - the results are in columns B to I.

I also tried streaming from the 7th frame (I-frame + 6) in presentation order, using the "none" hardware setting and the "QuickSync" hardware setting - the results are in columns K to R.

For the hardware "none" setting, it seems that the leading two B frames are always output (even though their input timestamps are negative), followed by the frame we actually expect (the one we seeked to). This explains why as I mentioned earlier that the Renderer will always show the first leading B frame (not the I frame as I thought) when you seek anywhere in the first GOP. It should not be outputting the leading B frames.

For the hardware "Quicksync" setting you do NOT get the leading B frames being output. The first output frame is the one we expect, so it has seeked correctly.

Note that in truth we actually get the frame 1 AFTER the one we expect. The decoder seems to output timestamp start values slightly lower than the input values, for example 333555 or 333662 instead of 333666, so I'm guessing that the frame we expect has an output timestamp that is slightly LESS than zero instead of zero - this might explain why it is not being output.

I haven't had time to do all this for the sample-chopped.MTS file aswell, but I suspect that the output of the leading B frames only for the first GOP for sample.MTS suggests that this has got something to do with open/closed GOP - or rather something similar/equivalent for H.264. My guess is that most files have all GOPs open, but these Sony files have the first GOP closed which is tricking the decoder into always outputting the leading B frames regardless of the negative timestamps.

rjappleton avatar May 17 '18 22:05 rjappleton

I see one thing thats different, you provide both fields packed together as one, with only one timestamp, instead of separately with individual timestamps like my own LAV Splitter does. That'll probably trigger additional parsing steps to split them again, which I can't really test easily right now. I would recommend to deliver fields independently with their own timestamp each.

Nevcairiel avatar May 17 '18 23:05 Nevcairiel

Thanks for looking at this.

Re the observation I made "Note that in truth we actually get the frame 1 AFTER the one we expect. ". I tried adding about 200 to the demux's output time stamps and that shift was enough to make the seeked-to frame be output by the decoder. As you can see in the spreadsheet, the decoder does seem to shift the output timestamp compared to the input timestamp. In this case the shift was backwards so that the start timestamp was < 0 which the decoder did not output - adding the extra 200 was enough to keep it > 0 so the frame was output OK.

Anyway, re your comments about splitting interlaced frames. I currently just send the samples based on when there is a PTS in the transport stream with minimal parsing - this seems to work great in nearly all cases. Looking at the samples I can see that there are two pictures in each sample, so presumably the fields were coded separately.

I also notice that every GOP except the first has one PPS at the I-frame. But the first GOP has three extra PPS's - one at the first leading B frame and the other two at later frames. Maybe the presence of these extra PPS's is the cause of the leading B frames always being output for the first GOP.

I haven't tried you suggestion of parsing the sample and sending the 2 fields as separate samples,. But I did try preventing the 2 leading B frames from being sent by the demux - so the demux outputs in coded order IPBPBPB... and that did the trick, the troublesome leading B frames were NOT output by the decoder and the seek worked correctly (apart from the 1 frame shift mentioned above).

So it looks like NOT sending the leading B frames might be a good workaround. There does not seem to be any visual corruption in the frames. Can anyone think of a reason why omitting the leading B frames in this way might cause problems ?

rjappleton avatar May 18 '18 12:05 rjappleton

Ok, omitting the leading B frames does cause problems with some files.

So I tried your suggestion of splitting the AVCHD interlaced frame into its 2 fields/pictures and send them as 2 separate directshow samples and that seems to have solved the first GOP problem. You don't even need to set a timestamp on the second sample.

Every AVCHD file I have seen has transport stream markers for the start of each frame, so it's very easy to demux them in frame-sized samples without having to parse the content.

In the vast majority of cases the LAV decoder is happy to have the 2 interlaced pics sent in 1 sample, so the code is there in LAV to handle this OK. It would be great if we knew what is causing the issue with the first GOP of specific interlaced files.

For the time being, the workaround seems to be if we seeked in the first GOP, then search the sample for a second picture and if found split the sample and send as 2 separate samples. If we seeked into the 2nd or later GOP then just send the whole frame in 1 sample.

Later ... it seems that the problem has something to do with the first leading B frame. If I split just that B frame then seeking works OK - you don't need to split any other frame.

rjappleton avatar May 21 '18 16:05 rjappleton

LAV does have the ability to split the fields, no question about that, I do however not know what causes a few timstamps to get screwed up in that scenario, and unfortunately I also do not have a source filter at hand that sends them like that to figure out what might be going on.

And yeah, dropping B-frames can cause problems, since it is possible to use B-frames as references in H.264, so if that feature is used, you might need them.

Nevcairiel avatar Jun 04 '18 00:06 Nevcairiel

Ok. I'll use the workaround if the LAV decoder is used.

Actually, you don't even need to split at a field boundary.

At the first B frame, I can just send a small time-stamped sample containing just the Access UD (at the beginning of the frame), and then send a sample with NULL time-stamp that contains the rest of the frame (both fields). LAV Decoder works OK with that.

Thanks again.

rjappleton avatar Jun 04 '18 16:06 rjappleton

I have simular problem. I think it in LAV splitter. When LAV splitter is installed, DirectShowSource("RX100source.MTS") from avs script skip first 30 frames from 1080/60p sequence. When LAV splitter uninstalled (but LAV decoder still active), and installed Haali matroska splitter, there is no problem, all frames are decoded.

DeniR avatar Apr 30 '19 10:04 DeniR