obs-studio icon indicating copy to clipboard operation
obs-studio copied to clipboard

obs-ffmpeg: New default settings for AMD encoders

Open rhutsAMD opened this issue 1 year ago • 27 comments

Description

Introducing new default AMD encoder settings for improved perceptual quality for AVC/HEVC/AV1. These new default settings have been tuned to target stable recording on RX 5700 XT and up on the recent driver.

AVC AVC HEVC HEVC AV1 AV1
Up to 1080p60fps > 1080p60fps Up to 1080p60fps > 1080p60fps Up to 1080p60fps > 1080p60fps
RATE_CONTROL_METHOD CBR CBR CBR CBR CBR CBR
PEAK_BITRATE Same as TARGET_BITRATE Same as TARGET_BITRATE 1.5 X TARGET_BITRATE 1.5 X TARGET_BITRATE 1.5 X TARGET_BITRATE 1.5 X TARGET_BITRATE
VBV_BUFFER_SIZE Same as TARGET_BITRATE Same as TARGET_BITRATE Same as PEAK_BITRATE Same as PEAK_BITRATE Same as PEAK_BITRATE Same as PEAK_BITRATE
FILLER_DATA_ENABLE TRUE TRUE FALSE FALSE FALSE FALSE
ENFORCE_HRD TRUE TRUE Not set Not set Not set Not set
MAX_B_FRAMES 2 0 - - - -
B_PIC_PATTERN 2 0 - - - -
SCREEN_CONTENT_TOOLS - - - - TRUE TRUE
PALETTE_MODE - - - - TRUE TRUE
PRESET Quality Quality Quality Quality high quality Balanced
PROFILE high high main/main10 main/main10 main main
ENABLE_VBAQ/AQ_MODE true for RCMethod != (CQP | HQVBR | HQCBR) true for RCMethod != (CQP | HQVBR | HQCBR) true for RCMethod != (CQP | HQVBR | HQCBR) true for RCMethod != (CQP | HQVBR | HQCBR) Not set Not set

Motivation and Context

This PR provides new recommended defaults for the AMD encoders that have been optimized for perceptual quality. The settings are automatically applied depending on the resolution and framerate. The default settings can be overridden as expected and only provide an improved base foundation for encoder settings targeted at improving perceptual quality.

How Has This Been Tested?

Tested on RX 5000, 6000, and 7000 series cards using public driver version 24.2.1.

Types of changes

  • Tweak (non-breaking change to improve existing functionality)

Checklist:

  • [x] My code has been run through clang-format.
  • [x] I have read the contributing document.
  • [x] My code is not on the master branch.
  • [x] The code has been tested.
  • [x] All commit messages are properly formatted and commits squashed where appropriate.
  • [x] I have included updates to all appropriate documentation.

rhutsAMD avatar Aug 02 '23 20:08 rhutsAMD

Testing this PR, and unfortunately I ran into a crash that seems to occur in the AMD drivers. Crash: https://obsproject.com/logs/A1hmycbXo3FBhlJJ

Log: https://obsproject.com/logs/JZHZe6rt8vkBVQyV

Occured when stopping the last recording (w hevc).

I've tried reproducing it, but it seems to be somewhat intermittent, so I have not managed to find specific reproduction steps so far. Will update if I find a reliable repro.

flaeri avatar Aug 03 '23 11:08 flaeri

I am a little bit worried about these defaults. This is primarily because it enables Pre-Analysis (PA). PA eats a decent chunk of traditional GPU compute, and this can in turn lead to encoder lag if there is contention for GPU resources.

In my opinion, people with older/weaker hardware, like rx4/500 series, or APUs, this is likely to cause some problems, which are not currently easily remedied. As it stands, the only way to disable PA is to type params into the text box, and I worry about the amount of people who would manage to actually disable PA on their own without the assistance from OBS support or someone knowledgeable, or consulting a wiki/guide.

I would personally prefer one or a combination of the following:

  • Keep PA off by default, change the default rate control for streaming to HQCBR
    • Rate controls requiring PA would automatically enable it (HQCBR, QVBR etc), which it currently does today
  • Settings checkbox to enable/disable PA
  • Keep PA off unless RC = CBR (other RCs requiring it functions as is, and enable PA (HQCBR, QVBR etc)

This would allow rate controls where PA is less impactful (VBR, CQP etc) to be easily (and/or automatically) be used without the additional GPU load.

I understand the reasoning and desire to have AMF look better by default for streaming, and at the very least be easier/simpler to achieve, and agree with that sentiment.

Just my two cents. Thank you for the PR, it seems to work well for me outside of that one crash, which seems to have been a one off (cannot reproduce it).

flaeri avatar Aug 03 '23 13:08 flaeri

FILLER_DATA_ENABLE=true has no effect with ENFORCE_HRD=false (which is the default) - no filler data will be inserted.

Also ENFORCE_HRD=false will allow the bitrate to peak very high even with CBR, which doesn't seem to be a good thing for streaming.

nowrep avatar Aug 05 '23 14:08 nowrep

I am a little bit worried about these defaults. This is primarily because it enables Pre-Analysis (PA). PA eats a decent chunk of traditional GPU compute, and this can in turn lead to encoder lag if there is contention for GPU resources.

In my opinion, people with older/weaker hardware, like rx4/500 series, or APUs, this is likely to cause some problems, which are not currently easily remedied. As it stands, the only way to disable PA is to type params into the text box, and I worry about the amount of people who would manage to actually disable PA on their own without the assistance from OBS support or someone knowledgeable, or consulting a wiki/guide.

I would personally prefer one or a combination of the following:

  • Keep PA off by default, change the default rate control for streaming to HQCBR

    • Rate controls requiring PA would automatically enable it (HQCBR, QVBR etc), which it currently does today
  • Settings checkbox to enable/disable PA

  • Keep PA off unless RC = CBR (other RCs requiring it functions as is, and enable PA (HQCBR, QVBR etc)

This would allow rate controls where PA is less impactful (VBR, CQP etc) to be easily (and/or automatically) be used without the additional GPU load.

I understand the reasoning and desire to have AMF look better by default for streaming, and at the very least be easier/simpler to achieve, and agree with that sentiment.

Just my two cents. Thank you for the PR, it seems to work well for me outside of that one crash, which seems to have been a one off (cannot reproduce it).

Thank you for providing feedback on the PR and testing it.

Older/weaker hardware such as RX 4/500 series do not support PA and will not be allowed to enable it. To clarify, PA is only supported on Radeon RX 5000 Series or newer GPUs as well as Ryzen 2000 U/H series or newer APUs. GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/doc/AMF_Video_PreAnalysis_API.md#13-supported-hardware

The recommended quality settings are to improve the quality at 1080p60 or lower resolutions/throughput.

PA is kept off/not set for higher than 1080p60. The same idea applies for the preset where balanced mode is applied for all codecs for higher than 1080p60.

Since you mention having the RC method set to HQCBR for streaming, is there a way to identify the recording or streaming use case at the encoder plugin level? My understanding is that the AMF encoder plugin in OBS just receives textures and encodes them like the other encoders in OBS. It implements the encoder interface (obs_encoder_info) and is not aware of what OBS does with the textures after it is done encoding. Is there some way to tell if the encoded textures will be used for recording/streaming? If I remember correctly, OBS is currently enforcing CBR only for streaming rtmp-common.c#L665

We are also interested in having a PA enable/disable checkbox but from past discussions it seems that UI/interface changes need to be done in a vendor agnostic way by OBS developers.

rhutsAMD avatar Aug 09 '23 12:08 rhutsAMD

PA support: I'm particularly worried about the APUs. They don't have a lot of GPU to play with, and I believe basically any game would lead to GPU contention, and ultimately encoder lag.

HQCBR: I'm not certain, but I think you are correct. What I was envisioning was the defaults in the UI being different in streaming vs recording, but now that my memory has been jogged, that is not something that can be done in a simple manner.

PA bool/checkbox: Which leaves us back at the issue of the UI/UX, which is tricky. I know they don't want more checkboxes, but I am failing to see any other way to neatly handle it.


If it is just kept as is (PA enabled by default, no easy way to disable), then I truly believe that every APU user that tries streaming/recording gameplay will have a very poor experience, and be unlikely to figure it out on their own. There are a lot of APU users, and having the defaults be unusable(for gameplay), where the only solution is very arcane just really vexes me, personally. I think it would lead to substantially more support request as well.

At this point, I am just retreading the same ground, so I don't really have anything more to add. Thank you for your time and effort :)

flaeri avatar Aug 09 '23 19:08 flaeri

OBS log: log.txt

I figured it would be pertinent to add some data on my claims instead of people just having to take my word for it. Here is an example of the APUs I'm talking about.

Its a simple older game (assault android cactus), running at 1366x768, targeting 60fps. All else equal between the two OBS sessions. Running stock, it hovers around 50fps, and is pretty pleasant to play. With the PR its hovering 40fps, but the lows are very noticable, quite jarring/stuttery (big fluctuations), a poor experience.

Regarding OBS, as you can see in the log, current master/stock managed to stream/record without any render or encoder lag. With the PR it is unfortunately unusable skipped frames due to encoding lag: 3895/4201 (92.7%)


Side thought regarding PA handling. If bool/checkbox is out of the question, then perhaps we could at least somehow exclude APUs somehow? MAX_THROUGHPUT is not supported on this chip, and returns 0. Maybe that would be a simple way to exclude APUs (assuming similar chips behave the same). As far as I know there is currently no way for AMF to identify low power/APU hardware.

flaeri avatar Aug 10 '23 10:08 flaeri

I don't think having AMF_PA_TAQ_MODE set to 2 by default is a good idea considering how using AMF_PA_LOOKAHEAD_BUFFER_DEPTH with it will overload the encoder in OBS at resolutions > 864p. The only way to use it without overloading is to set the preset to Balanced or Speed but even then there will be constant frame skips present. Not to mention that TAQ Mode 2 will constantly crash OBS when you stop Recording/Streaming regardless if it overloads or not.

In ffmpeg, if AMF_PA_TAQ_MODE set to 2 is used in combination with AMF_PA_LOOKAHEAD_BUFFER_DEPTH at 1080p60, you'll frequently find the encoding speed going below 1x, which would explain the overloads in OBS. Here too, it's necessary to dial down the preset back to Balanced or Speed to achieve real-time encoding speeds again.

OBS Log file with the suggested defaults above: https://obsproject.com/logs/ID4dOFS2O1OrJxSL

hxzael avatar Aug 14 '23 20:08 hxzael

I don't think having AMF_PA_TAQ_MODE set to 2 by default is a good idea considering how using AMF_PA_LOOKAHEAD_BUFFER_DEPTH with it will overload the encoder in OBS at resolutions > 864p. The only way to use it without overloading is to set the preset to Balanced or Speed but even then there will be constant frame skips present. Not to mention that TAQ Mode 2 will constantly crash OBS when you stop Recording/Streaming regardless if it overloads or not.

In ffmpeg, if AMF_PA_TAQ_MODE set to 2 is used in combination with AMF_PA_LOOKAHEAD_BUFFER_DEPTH at 1080p60, you'll frequently find the encoding speed going below 1x, which would explain the overloads in OBS. Here too, it's necessary to dial down the preset back to Balanced or Speed to achieve real-time encoding speeds again.

OBS Log file with the suggested defaults above: https://obsproject.com/logs/ID4dOFS2O1OrJxSL

I'm aware the log file I provided has a lot of clutter from other existing plugins, here is another log file from a clean Scene Collection where I reproduced the results I explained earlier.

https://obsproject.com/logs/bX4QINZDD6lxtcen

I have also attached a crash log that happened after stopping the recording. crash log AMF.txt

hxzael avatar Aug 16 '23 21:08 hxzael

HQCBR and HQVBR are almost unusable, even on the 7900XTX. For 1080p60 recording, they are all overloaded.

akiirui avatar Nov 30 '23 18:11 akiirui

HQCBR and HQVBR are almost unusable, even on the 7900XTX. For 1080p60 recording, they are all overloaded.

That is because rhuts forgot to tell you to use "Preset: Balanced" & "PASceneChangeDetectionEnable=false" Otherwise the encoder overloads.

lextra2 avatar Jan 13 '24 15:01 lextra2

@rhutsAMD My suggestions; Set scenecut of PreAnalysis to false, e.g. PASceneChangeDetectionEnable=false AMF_PA_SCENE_CHANGE_DETECTION_ENABLE

Reason: x264 & NVENC both have their scenecut for cbr/vbr disabled because a fixed keyint is preferred for streaming and having it disabled gives the encoder more headroom.

Also, use Preset: Balanced for all "HQ" presets.

Reason: The image quality difference of Balanced & Quality is minimal, but Quality is really pushing the encoder and so to maximize compatibility, Balanced would be the better option.

lextra2 avatar Jan 19 '24 19:01 lextra2

Yep, scenecut(scenehange) for real time livestreaming only hurt quality, also agree on bal preset, for higher end models qual is prob a no big deal but on my card for example(6650 xt), if i use PA stuff and bframes >2 qual preset can become quite a burden for the encoder, depending on the PA params in use.

Johl7 avatar Jan 20 '24 13:01 Johl7

Hello, any updates? Whats the hold up? As previously discussed, using AMF_VIDEO_ENCODER_QUALITY_PRESET_BALANCED & AMF_PA_SCENE_CHANGE_DETECTION_ENABLE=false should make the new settings work for most people.

QVBR, HQVBR & HQCBR are already using PreAnalysis anyways.

lextra2 avatar Feb 07 '24 01:02 lextra2

Please refrain from comments asking why something hasn't been merged yet.

We know this is here, we'll get to it as soon as we have time, in appropriate priority.

Fenrirthviti avatar Feb 07 '24 01:02 Fenrirthviti

To add some things ive noticed regarding some of the settings in this post. For the PA engine i would suggest either to use Vulkan or DX12, ~~from some testings ive done, OpenCL doesnt seem to work with PA propper~~. DX12 and Vulkan had better performance compared to the current default(DX11), at least in the few testings ive done. PAEngineType=11(DX12) and PAEngineType=10(Vulkan). Correction, OpenCL does seem to work.

Johl7 avatar Feb 07 '24 01:02 Johl7

PAEngineType=11(DX12) and PAEngineType=10(Vulkan).

Where are you getting those values from? The docs don't say anything about DX12 support

Nevermind found it @rhutsAMD could you please update the docs of PreAnalysis.h#L95 with the relevant enumerations? Its very useful information which otherwise gets lost.

lextra2 avatar Feb 07 '24 02:02 lextra2

PAEngineType=11(DX12) and PAEngineType=10(Vulkan).

Where are you getting those values from? The docs don't say anything about DX12 support

Nevermind found it @rhutsAMD could you please update the docs of PreAnalysis.h#L95 with the relevant enumerations? Its very useful information which otherwise gets lost.

Thank you for noticing this. The header and the PreAnalysis API doc on GPUOpen have now been updated to reflect the additional PAEngineType options.

Regarding the new recommended defaults for AMF in OBS, there is still some internal testing going on and we expect to have an updated set of recommended defaults. I will update this PR with the new set once those are ready.

rhutsAMD avatar Feb 07 '24 16:02 rhutsAMD

The PR has been updated and the settings have been retested on the recent public driver version 24.2.1.

rhutsAMD avatar Feb 29 '24 15:02 rhutsAMD

I have a question regarding the recommended lookahead buffer-depth setting of 20 in these recommendations.

The Preanalysis API (https://github.com/GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/doc/AMF_Video_PreAnalysis_API.pdf) specifies that AMF_PA_TAQ_MODE only works when AMF_PA_LOOKAHEAD_BUFFER_DEPTH is set to 11, 21 or 41. Do values other than these 3 work considering you're selecting 20 instead here? Would love some clarification on this, thank you.

hxzael avatar Feb 29 '24 23:02 hxzael

I have a question regarding the recommended lookahead buffer-depth setting of 20 in these recommendations.

The Preanalysis API (https://github.com/GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/doc/AMF_Video_PreAnalysis_API.pdf) specifies that AMF_PA_TAQ_MODE only works when AMF_PA_LOOKAHEAD_BUFFER_DEPTH is set to 11, 21 or 41. Do values other than these 3 work considering you're selecting 20 instead here? Would love some clarification on this, thank you.

The three LAB depths ( short lookahead(11), medium lookahead(21) and long lookahead(41) ) are tested recommendations, rather than requirements as the original wording suggests.

I have updated the doc to clarify the wording of the example values as being suggestions.

rhutsAMD avatar Mar 01 '24 15:03 rhutsAMD

Other than two nits, do not use merge commits to fast-forward history. Use git rebase.

Rebased to bring in latest changes.

rhutsAMD avatar Aug 13 '24 18:08 rhutsAMD

hi @RytoEX , I addressed your requested changes. Could you please revisit this PR?

rhutsAMD avatar Aug 13 '24 18:08 rhutsAMD