QSVEnc icon indicating copy to clipboard operation
QSVEnc copied to clipboard

testing dGPU - ARC DG2 - decoding errors - edge cases - 4:4:4 12bit

Open bavdevc opened this issue 3 years ago • 6 comments

Hello @rigaya

atm. I'm testing the Intel ARC dGPU (A380), everything working brilliantly using windows/current windows beta driver (31.0.101.3793) - but linux is a bit troublesome so far (intel devs: 6x kernel driver not ready, backport-i915 some errors, intel media-driver not en par with windows etc.)

linux --check-features output differs from windows:

  • less features available for encoding (AV1)
  • decoding 4:4:4 missing

btw. could you test the 4:4:4 decode so far?

I tried (high bitrate):

  • HEVC 4.4.4 12 bit --> that works, but it looks like there are some bitrate limits, lossless HEVC produces some reproducible errors when bitrate gets too high
  • AV1 4:4:4 12 bit --> I could not get that working at all - did you?

==> everything else (low bitrate) is working fine, except VC1 decoding, that is painfully slow because of no hardware support in libvpl...and all those mem copy things

btw. if you need some samples/test material, I can provide you those - just tell me where to send those files/links

Kind regards

edit: we need a party in qsvenc - issue #100 now ;-) https://github.com/rigaya/QSVEnc/issues/100

bavdevc avatar Nov 16 '22 19:11 bavdevc

Thank you for sharing decode isssues.

  • HEVC 4:4:4 12bit I've tested file with 86Mbps HEVC 4:4:4 12bit encoded by x265, but seemed fine. Will you give me an example of bitrate which failes? I'll like to create a file near to that and test.

  • AV1 4:4:4 12bit Not working either for me.

    I'm not sure, but it seems like AV1 4:4:4 or 12bit is not supported yet, and Query function (MFXVideoDECODE_Query, which --check-features uses for checking) might be returning false result saying it supports AV1 4:4:4 12bit decode even though it actually does not.

    However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.

rigaya avatar Nov 17 '22 14:11 rigaya

ok, I was testing 4K HDR P3 PQ 444 60fps material - perhaps that was too much for the hardware decoder - avsw working fine with all input files.

source file is Prores 4444 xq working fine with avsw: plotbitrate_4k_hdr_prores_4444_xq_yuv444p12le lossless x265 yuv420p10le working fine with avhw: plotbitrate_4k_hdr_X265_yuv420p10le lossless x265 yuv422p10le working fine with avhw: plotbitrate_4k_hdr_X265_yuv422p10le lossless x265 yuv444p12le crashes hw decoder, only avsw possible: plotbitrate_4k_hdr_X265_yuv444p12le

but I think those are only edge cases for testing the hardware features - production workflow would not re-encode with libx265 or libaom-av1 444 12bit lossless before further processing

bavdevc avatar Nov 17 '22 17:11 bavdevc

  • However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.

I think so, too - software stack is getting better and more complete with every version, it's still development in progress

btw. I'm really surprised this little dg2 card can handle 1,493,818 kbit/s input with ease edit: I think the hardware limitation is below 4,294,967,295 ;-) smells like uint32 in bit/s, last working frame is 1126 in my sample: plotbitrate_4k_hdr_X265_yuv444p12le_1126

bavdevc avatar Nov 17 '22 17:11 bavdevc

just to really complete the decoder test, I also tested all (most combinations) of the other input formats (every format works with avsw, the following list is only for avhw/avqsv):

  • [x] H264 8bit yuv420p - profile main - level 4.0
  • [x] H264 8bit yuv420p - profile main - level 5.0
  • [x] H264 8bit yuv420p - profile high - level 4.0
  • [x] H264 8bit yuv420p - profile high - level 4.1
  • [ ] H264 8bit yuv420p - profile predictive 4:4:4 - level 5.1 Failed to initialize decoder. : invalid video parameters.
  • [ ] H264 8bit yuv420p - profile predictive 4:4:4 - level 5.2 Failed to initialize decoder. : invalid video parameters.
  • [x] HEVC 10bit yuv420p10le
  • [x] HEVC 10bit yuv422p10le
  • [x] HEVC 12bit yuv444p12le going to insane bitrate/lossless: MFXDEC: DecodeFrameAsync error: device operation failure.., Break in task MFXDEC: device operation failure..
  • [x] MPEG2 8bit yuv420p
  • [ ] VP9 10bit yuv420p10le MFXDEC: DecodeFrameAsync error: failed to allocate memory.. Break in task MFXDEC: failed to allocate memory.. that should be VP9 profile 2 - perhaps not all levels work
  • [ ] VP9 12bit yuv444p12le MFXDEC: DecodeFrameAsync error: failed to allocate memory.. Break in task MFXDEC: failed to allocate memory.. that should be VP9 profile 3 - perhaps not all levels work
  • [x] AV1 10bit yuv420p10le
  • [ ] AV1 12bit yuv444p12le Failed to initialize decoder. : invalid video parameters. not implemented yet

btw. I think I'm done decoder testing atm. - I'll keep those ffmpeg/generated test files to test them with all the future driver/qsvencc releases - perhaps I'll automate that step with a little script for windows/linux

bavdevc avatar Nov 18 '22 16:11 bavdevc

I was able to reproduce the HEVC 12bit 4:4:4 created myself using x265 lossless, running into "device operation failure".

It seems like it might be hardware limitation (or driver issue?), as there were no problem found in the application side, the bitrate of the input file was 4317Mbps, way too high...

rigaya avatar Dec 02 '22 09:12 rigaya

thank you @rigaya for the confirmation - as you can see in my previous post I could make everything to work with hardware decoding except VP9 decode (tested profile 2+3) - either it is just my test files that go too far or there is still an error somewhere in the complete software stack. (btw. VP9 encoding works, slow but it works - but decoding no chance so far).

btw. I would close that issue #100 at the current state and create a new one if something noteworthy would change to the better or worse in the future if that is ok with you.

btw. one last technical question, perhaps you know the answer or can tell me where I can find some more info: -> using windows driver and Dx11va I notice there are several threads for GPU tasks: HWINFO64: hwinfo_gpu_engines Taskmanager: taskmanager_gpu_engines

--> crop/resize and vpp-deinterlace uses the the 1st or the 2nd "Video processing" engines --> vpp-yadif uses the "GPU compute" engine

==> but why do some movies use "Video decode 1" engine and some others use both "Video Decode" engines? even if the first one is not saturated at all?

bavdevc avatar Dec 03 '22 20:12 bavdevc