gopro-dashboard-overlay icon indicating copy to clipboard operation
gopro-dashboard-overlay copied to clipboard

Support AMD GPU encoding

Open tve opened this issue 3 years ago • 19 comments

I just got a new box with an integrated AMD GPU. Of course getting hardware encode/decode to work cost a bunch of hair off my head... I'm using linux, it's possible that the 'AMF' drivers on windows make things easier, dunno.

The simplest profile settings ASFAIK are:

  "vaapi": {
    "input": ["-hwaccel", "vaapi"],
    "output": ["-vcodec", "h264_vaapi"]
  },

The result is:

Impossible to convert between the formats supported by the filter 'Parsed_overlay_0' and the filter 'auto_scale_1'

which is ffmpeg's way to say that the output of the overlay filter can't be piped like that into the h264_vaapi encoder 'cause the latter expects the frame to be in the hardware/gpu. What's needed is a 'hwupload' filter that does the upload to the gpu memory. E.g., in FFMPEGOverlay instead of

            "-filter_complex", f"[0:v][1:v]overlay{filter_extra}",

it needs

            "-filter_complex", f"[0:v][1:v]overlay{filter_extra},hwupload",

nice, eh?

BTW, I noticed that ffmpeg has an overlay_vaapi filter, so this would mean the decoded video frames would stay in gpu, get overlaid, and then encoded. Sadly the AMD vaapi driver doesn't support that... I believe the Intel one might.

tve avatar Jul 02 '22 06:07 tve

This is great info, thanks for the detective work. I'll see if I can find a way to introduce this to the profile concept. Ffmpeg is super powerful but it doesn't seem to abstract the complexity away sometimes...

time4tea avatar Jul 02 '22 08:07 time4tea

Every time I need to do something different with ffmpeg I have to spend time looking up docs, blogs, and stackexchange...

After digging into it, I'm not sure whether it's worth pursuing the vaapi encoding. The quality I'm getting is crap. The only really useful way to use it (for me) is with constant-quality mode (-qp flag). The default quality (-qp 20) is good, but the file size is ~2.5x the original. I find -qp 23 at the limit of what I'd accept (stuff just starts to get soft) and the file size is still ~2x the original. -qp 24 is noticeably soft and the file size is still 2x.

Compared to libx264... the very-fast preset you use by default is 20% slower than using vaapi encoding (vaapi takes the same time regardless of setting), produces a file that is a bit over half the original, and the quality is decent, perhaps similar to -qp 23 above. Using preset super fast is faster than vaapi for me and produces a file between very-fast and the original. Then there's also ultra-fast, which is good quality, very fast, but produces a file a tad larger than the original.

The one big caveat is that I'm using an AMD Ryzen 9 5900HX with integrated GPU. I don't know and have not found any info on how the video encoding block on that iGPU compares to those on higher-end discrete AMD GPUs. I also don't know what limitations the Linux VAAPI driver has vs. the actual HW capabilities that may be accessible on Windows. I do find complaints that the AMD HW doesn't produce B-frames, which I can verify looking at the files produced.

Any way, I'm planning to use the VAAPI decoding (no quality harm there) and then the libx264 ultra-fast preset to get an initial rendering and then redo using the medium preset. (Medium produces great quality, files ~60% the size of the original, but takes 2x as long as the very fast preset.)

tve avatar Jul 03 '22 19:07 tve

Based on your comment, "I'm not sure whether it's worth pursuing the vaapi encoding" - I wasn't planning to do anything with AMD GPU support. It that's not what you meant, please let me know! In any case, I don't have an AMD GPU, so I'd be relying on you completely for implementation information... :-)

time4tea avatar Jul 22 '22 15:07 time4tea

@tve what quality can one get with libx264 with the same file size as -qp 20?

DemiMarie avatar Dec 31 '22 18:12 DemiMarie

I dont know if anyone has experimented much with AMD GPU settings, but if there are recommendations, I'd be happy to include them in the documentation. I dont have an AMD GPU so can't offer much, I'm afraid...

time4tea avatar Dec 31 '22 18:12 time4tea

@tve I revisited the vaapi config a little bit, and I think that I can make the config possible, by adding an optional "filter" parameter to the profile. Did you have any success with getting vaapi to work? What parameters did you use? Thanks!!

time4tea avatar Jan 08 '23 13:01 time4tea

I did not pursue vaapi further after my last comment above. I'm using libx264 and the veryfast setting.

tve avatar Jan 09 '23 07:01 tve

Since 0.93.0, and support for the input/filter/output settings in the "profiles" configuration, this should be possible.

I dont know the settings, but check the PERFORMANCE_GUIDE doc, and the same sort of thing should work for vaapi...

time4tea avatar Jun 04 '23 20:06 time4tea

Just wanted to confirm that the "filter" property in profiles does indeed let you use vaapi. I'm currently using:

{
  "vaapi": {
    "input": [
      "-hwaccel", "vaapi",
      "-hwaccel_device", "/dev/dri/renderD128",
      "-hwaccel_output_format", "vaapi"
    ],
    "filter": "[1:v]format=rgba,hwupload[overlay];[0:v][overlay]overlay_vaapi",
    "output": [
      "-vcodec", "h264_vaapi",
      "-movflags", "faststart"
    ]
  }
}

paxunix avatar Jan 28 '24 18:01 paxunix

This is fantastic info. Thank you for sharing it.

time4tea avatar Jan 28 '24 18:01 time4tea