libvpl icon indicating copy to clipboard operation
libvpl copied to clipboard

VPL failing to decode samples from a valid sequence, invalid MFX_ERR_MORE_DATA for I frames

Open chacha21 opened this issue 1 year ago • 26 comments

VPL 2.10.1 (not a regression, it did not work with previous version either)

I want to use VPL to decode a H264 sequence embedded in MP4 container (link below to data and sample code). I use Microsoft Media Foundation to query the raw encoded samples from the file I submit the samples to a properly initialized mfxSession For the very first sample (which is a valid I frame), MFX_ERR_MORE_DATA is issued.

By pushing more and more samples, I can finally get some decoded data, but this is not expected behaviour

I want the decoding session to provide the decoded data synchronously when all the required data for an I frame has been submitted.

TestMFTVPL.zip

Is this a VPL design concern ?

chacha21 avatar Feb 01 '24 14:02 chacha21

BTW, where do I download a recent vplswref64.dll ? I can't tell where mine comes from, but it is not built by the current libvpl git project [edit] found here : https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#inpage-nav-8-9

does not solve the bug

chacha21 avatar Feb 01 '24 15:02 chacha21

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

shepark avatar Feb 01 '24 18:02 shepark

This vplswref64.dll was a component leveraging CPU process, and we don't support it anymore. We support only gpu runtime only which it comes with gfx driver. For that, please check the hello-decode sample.

I don't have a development machine with Intel GFX, so I am bound to use "vplswref64.dll".

I don't think that it will be relevant to this thread, I mentioned it because the sample code I provided expects the vpl run-time dlls to be deployed on the host machine, they are not part of the project; so for any one who wants to test, this reference was needed at least to be able to run the code properly.

chacha21 avatar Feb 01 '24 18:02 chacha21

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

shepark avatar Feb 01 '24 18:02 shepark

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

chacha21 avatar Feb 01 '24 20:02 chacha21

The original intention of cpu runtime was to provide "reference" functionality as you mentioned. But we discontinued it. CPU runtime might have had the issue to deal with the input you feed. So, I recommend you to have Intel device and try.

I'll try to find a host machine with compatible Intel HD graphics. But I'm curious to know if you can test with HW acceleration and see the bug yourself.

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

shepark avatar Feb 01 '24 20:02 shepark

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

chacha21 avatar Feb 01 '24 20:02 chacha21

What is your goal? Do you want to decode frame by frame? or to decode a video stream and this I frame test is for experiment VPL?

I am evaluating vpl as an alternative backend engine for encoding and decoding sequences. I already use Microsoft Media Foundation, CUDA NVEnc, and had a IMSDK implementation in the past.

My use case is : -encoding : either stream, or sequence of images -decoding : either stream, or sequence of images to be randomly accessed frame by frame (not a simple incremental playback).

For decoding, I am pretty familiar with NALUs, I can parse raw samples if needed, and I am already able to determine I, P and B frame in order to submit enough data for each frame, so I really focus on vpl as the last step of decoding.

Currently I am experimenting VPL, but since I do not have compatible hardware, I thought that I could rely on the SW reference implementation to get the best tests, even without full performance.

Got it. Thank you for the detail information. I will try your code quickly and see whether there's anything missed.

shepark avatar Feb 01 '24 21:02 shepark

Do you see "MFX_ERR_MORE_DATA" from this part? mfxBitstream bs = {0}; bs.Data = rawFileContent.data(); bs.MaxLength = static_cast<mfxU32>(rawFileContent.size()); bs.DataLength = static_cast<mfxU32>(rawFileContent.size()); decodeParams.mfx.CodecId = MFX_CODEC_AVC; decodeParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY; printf("try MFXVideoDECODE_DecodeHeader..."); status = MFXVideoDECODE_DecodeHeader(session, &bs, &decodeParams); printf("=>status = %d\r\n", status);

Then, it won't be working because you are feeding mp4 stream, not video elementary stream. You probably know that mp4 is container, and you need to extract raw video data from each packet. VPL does not support any type of container.

shepark avatar Feb 01 '24 22:02 shepark

Do you see "MFX_ERR_MORE_DATA" from this part?

No. I can't send a console log right now (AFK) but the MFX_ERR_MORE_DATA that bothers me is this one :

  if (decStatus == MFX_ERR_MORE_DATA)
    printf("unexpected MFX_ERR_MORE_DATA, this is a mfx wrong behaviour\r\n");

To be comprehensive, please note that

  • only the "try by manually filling params known in advance" works to fill the decodeParams
  • MFXVideoDECODE_Reset() issues an error that I cannot explain but it does not seem to be a stopper

And finally :

  • at first sight we could say that the mfx decoder is badly configured or cannot handle that codec
  • but it is a wrong assumption : if you keep pushing samples, you will get correctly decoded images; you just don't get them when enough data is provided, but a little after, like if a Flush() was pending and not done at the right moment

chacha21 avatar Feb 01 '24 22:02 chacha21

Can you share the code you modified? It fails at where I pointed out and can't reach there. Looks like you commented out some parts.

shepark avatar Feb 01 '24 22:02 shepark

Can you share the code you modified?

I did not modify the code attached to the first post of this issue The code shows different strategies to initialize things and some errors are normal. I just put assert() for critical failures.

My console ouput is shown below : Capture

chacha21 avatar Feb 02 '24 07:02 chacha21

TestMFTVPL.zip

Please check this code. It's dirty but I modified code, to load gpu runtime, to save I frame output and I added some comments. Please refer "hello-decode" or "sample_decode" for general implementation.

shepark avatar Feb 02 '24 12:02 shepark

Ok, I see what you did. I will test on Monday, and if it works I will perform even more tests to check extensively and compare with what the doc claims or misleadingly suggests. Then only I will come back for feedback.

chacha21 avatar Feb 02 '24 21:02 chacha21

Ok, I have tested and there are many problems :

  • indeed, I can get 1 frame thanks to the "pseudo sync" trick where you use a null bs
  • for some reason, I have a memory fault when trying to read the UV data. This is very suspicious. So I only read Y for the moment to get gray scale (Yes I know that UV data is h/2 and either w or w/2 according to a flat or 2-channels interpretation)
  • after reading 1 frame, the next calls to DecodeFrameAsync() will return MFX_ERR_ABORTED. According to the VPL doc, sending a null bitstream is supposed to be done at the end of stream, not end of frame. I think that's why
  • thus I cannot get more frames
  • calling MFXVideoDecode_Reset() between frames would be inefficient but could help. Actually it just does not work (it always returns an unexpected and unexplained error)

chacha21 avatar Feb 06 '24 11:02 chacha21

That's why I asked about the goal of your final app, not your experiment. I did show you how you can decode I frame only. If you want to decode full frames, please refer hello-decode or sample_decode.

shepark avatar Feb 06 '24 11:02 shepark

I did show you how you can decode I frame only.

My code can handle non-I frames, Thanks to the Media foundation part, I can read and accumulate samples starting from the previous I frame up to the targeted P frame, and send all the bytes at once to vpl through the bistream. Then I expect VPL to output a frame since it must have enough data (but unfortunately, certainly because of internal buffering, MFX_ERR_MORE_DATA is returned)

The problem is not to read non-I frames, it is to read two different frames (at random positions) : "flushing" seems impossible since the "null bs" trick is not usable.

chacha21 avatar Feb 06 '24 12:02 chacha21

Have you read hello-decode sample? "null bs" is not a trick, it's needed when you drain remained decoded frames. Once you're done with reading input streams, then you should set bs to null and ask VPL to decode all the streams in the buffer and return. In your case, when it returns MFX_ERR_MORE_DATA, please call MFXVideoDECODE_DecodeFrameAsync() with null bs until it returns MFX_ERR_MORE_DATA again. Let's say you feed, "IPPPP" and want to get the third and last P frames. Then, you call MFXVideoDECODE_DecodeFrameAsync() with input stream (IPPPP). And if VPL returns MFX_ERR_MORE_DATA, then call MFXVideoDECODE_DecodeFrameAsyn() with bs=NULL until it returns MFX_ERR_MORE_DATA again. I expect it give you I, P, P, P. P

shepark avatar Feb 06 '24 12:02 shepark

Have you read hello-decode sample?

Sure, and I learnt nothing new

"null bs" is not a trick, it's needed when you drain remained decoded frames.

Apparently, you can only use it once at the end of the stream. Once it has been done to drain and get a frame (from I, or IP, or IPP, or IPPP...), subsequent calls to MFXVideoDECODE_DecodeFrameAsync() will return MFX_ERR_ABORTED and you cannot send a new bitstream to decode a new frame (I or IP or IPP...) at a totally different position.

chacha21 avatar Feb 06 '24 13:02 chacha21

That's right. Once you get MFX_ERR_MORE_DATA with bs null, it means no more data left and it will return real error in next call. So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

shepark avatar Feb 06 '24 13:02 shepark

So, your problem is.. you can't do this continuously but just once .. because decode process will be done once bs=null is given.

Right. With the "bs=null" drain, you fixed the initial problem of this issue thread, that was "can't get a frame at all". But now that I can get a frame, I see that the next step is "I can't get a second frame", in a scenario where I don't read a stream sequentially, but let the user choose a random position in the sequence.

chacha21 avatar Feb 06 '24 13:02 chacha21

ok.. I don't really have the optimal solution right now but.. Why don't you try giving enough buffer to VPL, which VPL can return Nth frame - avoid MFX_ERR_MORE_DATA? Meanwhile, I will check more.

shepark avatar Feb 06 '24 13:02 shepark

Please check this as well. https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

shepark avatar Feb 06 '24 18:02 shepark

https://intel.github.io/libvpl/latest/programming_guide/VPL_prg_decoding.html#bitstream-repositioning

Interesting, so MFXVideoDecode_Reset() should "officially" be the answer for stream repositionning (I suspected it would be inefficient, but that might be wrong). However, I mentioned from the beginning of that thread that in my sample code, MFXVideoDecode_Reset() always returns an error, even with decodeParams manually filled with proper values. I guess I'll have to investigate a little more (perhaps with a debugger to step in vpl source code) to get more clues about that.

chacha21 avatar Feb 06 '24 18:02 chacha21

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

shepark avatar Feb 12 '24 14:02 shepark

@chacha21 Could you close this issue and open new one if you have any issue with MFXVideoDecode_Reset()?

I might or not open a new issue, depending on the following considerations (not clear with the docs) :

  • scenario 1 : bs=null (in order to flush) then MFXVideoDecode_Reset() is the correct way to decode frames at random positions. In that case, I do have a problem with MFXVideoDecode_Reset() and can open a new issue
  • scenario 2: bs=null (in order to flush) then MFXVideoDecode_Reset() is expected to fail since the bs is considered aborted and thus MFXVideoDecode_Reset() can't help. In that case I cannot claim I observe a bug (even if I can't make it work, but it wouldn't be a MFXVideoDecode_Reset() problem)
  • scenario 3: MFXVideoDecode_Reset() can be used to flush before repositioning (very unlikely : the doc does not really tells that). In that case we are still bringing information to the current issue.

chacha21 avatar Feb 12 '24 14:02 chacha21

Closed due to no further issues or blocking feedback from submitter.

akwrobel avatar Apr 03 '24 17:04 akwrobel

Aaand, there never was a clear answer from the VPL team. See the last message above : those are pending questions.

chacha21 avatar Apr 03 '24 17:04 chacha21

@chacha21 We did not see a specific concern to address related to the original question.
Can you please clarify what you are looking specifically?

akwrobel avatar Apr 04 '24 17:04 akwrobel

I am still looking for a way to perform stream repositioning.

chacha21 avatar Apr 04 '24 18:04 chacha21