testing dGPU - ARC DG2 - decoding errors - edge cases - 4:4:4 12bit
Hello @rigaya
atm. I'm testing the Intel ARC dGPU (A380), everything working brilliantly using windows/current windows beta driver (31.0.101.3793) - but linux is a bit troublesome so far (intel devs: 6x kernel driver not ready, backport-i915 some errors, intel media-driver not en par with windows etc.)
linux --check-features output differs from windows:
- less features available for encoding (AV1)
- decoding 4:4:4 missing
btw. could you test the 4:4:4 decode so far?
I tried (high bitrate):
- HEVC 4.4.4 12 bit --> that works, but it looks like there are some bitrate limits, lossless HEVC produces some reproducible errors when bitrate gets too high
- AV1 4:4:4 12 bit --> I could not get that working at all - did you?
==> everything else (low bitrate) is working fine, except VC1 decoding, that is painfully slow because of no hardware support in libvpl...and all those mem copy things
btw. if you need some samples/test material, I can provide you those - just tell me where to send those files/links
Kind regards
edit: we need a party in qsvenc - issue #100 now ;-) https://github.com/rigaya/QSVEnc/issues/100
Thank you for sharing decode isssues.
-
HEVC 4:4:4 12bit I've tested file with 86Mbps HEVC 4:4:4 12bit encoded by x265, but seemed fine. Will you give me an example of bitrate which failes? I'll like to create a file near to that and test.
-
AV1 4:4:4 12bit Not working either for me.
I'm not sure, but it seems like AV1 4:4:4 or 12bit is not supported yet, and Query function (MFXVideoDECODE_Query, which --check-features uses for checking) might be returning false result saying it supports AV1 4:4:4 12bit decode even though it actually does not.
However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.
ok, I was testing 4K HDR P3 PQ 444 60fps material - perhaps that was too much for the hardware decoder - avsw working fine with all input files.
source file is Prores 4444 xq working fine with avsw:
lossless x265 yuv420p10le working fine with avhw:
lossless x265 yuv422p10le working fine with avhw:
lossless x265 yuv444p12le crashes hw decoder, only avsw possible:

but I think those are only edge cases for testing the hardware features - production workflow would not re-encode with libx265 or libaom-av1 444 12bit lossless before further processing
- However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.
I think so, too - software stack is getting better and more complete with every version, it's still development in progress
btw. I'm really surprised this little dg2 card can handle 1,493,818 kbit/s input with ease
edit: I think the hardware limitation is below 4,294,967,295 ;-) smells like uint32 in bit/s, last working frame is 1126 in my sample:

just to really complete the decoder test, I also tested all (most combinations) of the other input formats (every format works with avsw, the following list is only for avhw/avqsv):
- [x] H264 8bit yuv420p - profile main - level 4.0
- [x] H264 8bit yuv420p - profile main - level 5.0
- [x] H264 8bit yuv420p - profile high - level 4.0
- [x] H264 8bit yuv420p - profile high - level 4.1
- [ ] H264 8bit yuv420p - profile predictive 4:4:4 - level 5.1 Failed to initialize decoder. : invalid video parameters.
- [ ] H264 8bit yuv420p - profile predictive 4:4:4 - level 5.2 Failed to initialize decoder. : invalid video parameters.
- [x] HEVC 10bit yuv420p10le
- [x] HEVC 10bit yuv422p10le
- [x] HEVC 12bit yuv444p12le going to insane bitrate/lossless: MFXDEC: DecodeFrameAsync error: device operation failure.., Break in task MFXDEC: device operation failure..
- [x] MPEG2 8bit yuv420p
- [ ] VP9 10bit yuv420p10le MFXDEC: DecodeFrameAsync error: failed to allocate memory.. Break in task MFXDEC: failed to allocate memory.. that should be VP9 profile 2 - perhaps not all levels work
- [ ] VP9 12bit yuv444p12le MFXDEC: DecodeFrameAsync error: failed to allocate memory.. Break in task MFXDEC: failed to allocate memory.. that should be VP9 profile 3 - perhaps not all levels work
- [x] AV1 10bit yuv420p10le
- [ ] AV1 12bit yuv444p12le Failed to initialize decoder. : invalid video parameters. not implemented yet
btw. I think I'm done decoder testing atm. - I'll keep those ffmpeg/generated test files to test them with all the future driver/qsvencc releases - perhaps I'll automate that step with a little script for windows/linux
I was able to reproduce the HEVC 12bit 4:4:4 created myself using x265 lossless, running into "device operation failure".
It seems like it might be hardware limitation (or driver issue?), as there were no problem found in the application side, the bitrate of the input file was 4317Mbps, way too high...
thank you @rigaya for the confirmation - as you can see in my previous post I could make everything to work with hardware decoding except VP9 decode (tested profile 2+3) - either it is just my test files that go too far or there is still an error somewhere in the complete software stack. (btw. VP9 encoding works, slow but it works - but decoding no chance so far).
btw. I would close that issue #100 at the current state and create a new one if something noteworthy would change to the better or worse in the future if that is ok with you.
btw. one last technical question, perhaps you know the answer or can tell me where I can find some more info:
-> using windows driver and Dx11va I notice there are several threads for GPU tasks:
HWINFO64:
Taskmanager:
--> crop/resize and vpp-deinterlace uses the the 1st or the 2nd "Video processing" engines --> vpp-yadif uses the "GPU compute" engine
==> but why do some movies use "Video decode 1" engine and some others use both "Video Decode" engines? even if the first one is not saturated at all?