AMF icon indicating copy to clipboard operation
AMF copied to clipboard

[Question]: RDNA4 target bitrate deviation // Should increase in set number of used bframes lead to significant deviation from target bitrate?

Open DimkaTsv opened this issue 8 months ago • 12 comments

Not sure if it is bug, hence question first. Maybe it will move into bug category, depending on answer:

Question: Should it behave like this?

Behaviour in question: With increase of bframe, encode bitrate starts to deviate from target or expected (sometimes VBR deviates from target by itself) bitrate, sometimes by quite significant margin (around 15-20%). 0 bframes - slightly over target. 1 bframe - slightly under target. 2 bframes - close or somewhat over bitrate with 0 bframes 3 bframes - significantly over target.

This behaviour seems to happen with 9070XT, but i can only make this conclusion based on AVC bframes (As 7800XT did not support AV1 bframes). [Granted at this point i will be unable to test on anything outside of 9070XT]

Moreover, 7800XT also had encodes even without bframes follow target bitrate much more closely. And difference between 0 bframes and 3 bframes in terms of bitrate is minimal. Meanwile 9070XT can just encode 6000 kbps on 5000 kbps target, which is a bit too large of a deviation.

Both AVC and AV1 are affected.

[Yes, i use VCEEnc, so there is slight difference due to GOP length mostly, but general behaviour is same no matter what i use to encode]

Examples (all from 9070XT):

  1. 720p ducks flying from water reference video.
  2. 4k60 transcode of short screen capture (less than 10 seconds).
  3. 1080p60 transcode of screen capture (2 minutes)
  4. [references are numbers i have from 7800XT, if i have ones on hand] (1 / 2 / 3 [reference])

AVC: Target 5000 kbps: 0B - 5271 kbps / 6449 kbps / 5506 kbps [reference - 5050 kbps / 5479 kbps / 5331 kbps] 1B - 4879 kbps / 5305 kbps / 5006 kbps 2B - 5133 kbps / 6130 kbps / 5411 kbps 3B - 6068 kbps / 6633 kbps / 5861 kbps [reference - 5100 kbps / 5333 kbps / 5335 kbps]

Target 15000 kbps: 0B - 15888 kbps / 17907 kbps / 15984 kbps [reference - 15097 kbps / 15426 kbps / 15340 kbps] 1B - 14709 kbps / 14967 kbps / 15013 kbps 2B - 15780 kbps / 16887 kbps / 16378 kbps 3B - 19369 kbps / 18336 kbps / 17494 kbps [referemce - 15410 kbps / 15404 kbps / 15348 kbps]

[Why for 4k60 there is such high deviation with just normal VBR? It is 20% off by default for some reason!]


AV1: Target 5000 kbps: 0B - 5231 kbps / 6193 kbps / 5502 kbps [reference - 5035 kbps / 5310 kbps / 5349 kbps] 1B - 4928 kbps / 5260 kbps / 5096 kbps 2B - 5296 kbps / 6141 kbps / 5821 kbps 3B - 6299 kbps / 6456 kbps / 6136 kbps

Target 15000 kbps: 0B - 15237 kbps / 16835 kbps / 16018 kbps [reference - 15045 kbps / 15428 kbps / 15362 kbps] 1B - 14821 kbps / 15280 kbps / 15830 kbps 2B - 17127 kbps / 17552 kbps / 17209 kbps 3B - 18409 kbps / 18420 kbps / 17786 kbps

Based on remainders of data that i have from 7800XT, it followed target bitrate MUCH closer, with or without bframes (on same exact videos, with same exact transcoding command)

DimkaTsv avatar Apr 08 '25 01:04 DimkaTsv

Could you please share your bitstreams which demonstrate this issue as well as the list of other encoding parameters you are setting?

rhutsAMD avatar Apr 09 '25 22:04 rhutsAMD

Sure! Here is archive with bitstream data for one of the samples + reencodes with different amount of bitrates (1080p60 transcode of 2 min) https://drive.google.com/file/d/1dTekvGA6ciKz1vATwUguV4OaACEE1nno/view?usp=sharing

And list of encoding commands to get a comparison (in form of drag & drop .bat script) (Used TranscodeHW instead of VCEEnc as reference tool. VCEEnc got slightly higher bitrates compared to these, but these still are on point)

@echo on
::AVC TEST
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AVC_B0%~x1" -codec avc -RateControlMethod vbr -TargetBitrate 5000000 -PeakBitrate 15000000 -BPicturesPattern 0 -MaxConsecutiveBPictures 0
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AVC_B1%~x1" -codec avc -RateControlMethod vbr -TargetBitrate 5000000 -PeakBitrate 15000000 -BPicturesPattern 1 -MaxConsecutiveBPictures 1
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AVC_B2%~x1" -codec avc -RateControlMethod vbr -TargetBitrate 5000000 -PeakBitrate 15000000 -BPicturesPattern 2 -MaxConsecutiveBPictures 2
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AVC_B3%~x1" -codec avc -RateControlMethod vbr -TargetBitrate 5000000 -PeakBitrate 15000000 -BPicturesPattern 3 -MaxConsecutiveBPictures 3

::AV1 TEST
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AV1_B0%~x1" -codec av1 -Av1RateControlMethod vbr -Av1TargetBitrate 5000000 -Av1PeakBitrate 15000000 -Av1BPicturesPattern 0 -Av1MaxConsecutiveBPictures 0
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AV1_B1%~x1" -codec av1 -Av1RateControlMethod vbr -Av1TargetBitrate 5000000 -Av1PeakBitrate 15000000 -Av1BPicturesPattern 1 -Av1MaxConsecutiveBPictures 1
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AV1_B2%~x1" -codec av1 -Av1RateControlMethod vbr -Av1TargetBitrate 5000000 -Av1PeakBitrate 15000000 -Av1BPicturesPattern 2 -Av1MaxConsecutiveBPictures 2
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AV1_B3%~x1" -codec av1 -Av1RateControlMethod vbr -Av1TargetBitrate 5000000 -Av1PeakBitrate 15000000 -Av1BPicturesPattern 3 -Av1MaxConsecutiveBPictures 3
pause

0-1-2-3 bframes samples are encoded in order for both AVC and AV1. For 15 Mbps it is just matter of changing bitrate parameters.

AVC comparison: Image

AV1 comparison: Image

DimkaTsv avatar Apr 09 '25 22:04 DimkaTsv

There is also another weird issue with RDNA4 and VCEEnc --avhw (aka hardware decode) is SIGNIFICANTLY less performant than --avsw (aka CPU decode). Up to 5 times slower, for context (https://github.com/rigaya/VCEEnc/issues/121). This is not an expected behaviour.

Issue moved into it's own thread

When trying to do ffmpeg decode benchmark while encoding with VCEEnc, disproportionate decode performance loss can be noticed. Playing videos or YouTube does not impact it that much, but encode via VCEEnc does.

But for rigaya RDNA3 (7900 XT) GPU --avhw performs as expected. So, i believe this issue is RDNA4 specific, even though currently i have no idea how to approach it. Simply because i cannot test it directly with TranscodeHW (i believe it uses CPU decode via ffmpeg), so i cannot determine if it is AMF bug or what.

It is hard for me to check on other instruments, simply because every single one of those uses CPU noticeably. Can encode and decode use same pipeline on RDNA4, somehow? Even though it sounds weird and should not be like this (i thought that encode / decode components are independent). Or can there be some weird issue with load distribution?

Hence... Another question that i am asking here.

DimkaTsv avatar Apr 09 '25 22:04 DimkaTsv

Regarding decoder performance;

  • Let's make it a separate issue.
  • RX 9xxx has a single VCN but it is more performant then RX 7xxx. So there no reason to see such numbers.
  • You can try FFmpeg with latest AMF decoder, though AMF decoder should show the same performance as D3D11VA in FFmpeg.
  • You can also try AMF sample TranscodeHW.

MikhailAMD avatar Apr 10 '25 13:04 MikhailAMD

Let's make it a separate issue.

Done. New issue was created for this particular problem.

Issue moved into it's own thread >RX 9xxx has a single VCN but it is more performant then RX 7xxx. So there no reason to see such numbers.

I also think in this direction, but right now... I cannot get it to performance level of RDNA3, because almost everything tries to do CPU decode. I believe only way i actually managed to get ffmpeg to have HW decode was to use benchmark command [%~dp0\ffmpeg -benchmark -hwaccel d3d11va -hwaccel_output_format d3d11 -i %1 -f null -] Thus, VCEEnc is probably only tool where HW decode was actually enforced, and as consequence it sees unexpectedly severe performance drop.

DimkaTsv avatar Apr 10 '25 14:04 DimkaTsv

Hmm... I guess i will also add on top of that B-frame bitrate deviation that it seems like HEVC is also not very fond of following bitrate strictly.

Command used for all encodes below. %~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_1%~x1" -codec HEVC -HevcRateControlMethod vbr -HevcTargetBitrate 5000000 -HevcPeakBitrate 15000000

Image

As you can see all of these encodes are noticeably above target 5000 kbps bitrate. And one of the encodes is over peak bitrate limit (20430 kbps vs 15000 kbps target peak)

DimkaTsv avatar May 09 '25 19:05 DimkaTsv

I can confirm this. This is happening on OBS. Let's say the bit rate is 5000 kbps, but it generally stays around 5500-6000 kbps. Here are my settings;

1920x1080 60fps bitrate = 5000 kbps, keyframe interval 2s, preset = quality, PA checked, max B frames = 2

MaxNumRefFrames=4 MaxConsecutiveBPictures=1 BPicturesPattern=2 AdaptiveMiniGOP=true LowLatencyInternal=true RateControlPreanalysisEnable=1 EnableVBAQ=false HighMotionQualityBoostEnable=true DeBlockingFilter=true BReferenceEnable=true HalfPixel=true QuarterPixel=true PAFrameSadEnable=true PALookAheadBufferDepth=31 PAPerceptualAQMode=1 PACAQStrength=2 PATemporalAQMode=2 PASceneChangeDetectionEnable=true

lazye53 avatar Oct 30 '25 06:10 lazye53

@lazye53 What driver version did you use?

StevenXiao-AMD avatar Nov 03 '25 19:11 StevenXiao-AMD

@lazye53 Could you please test again by following the suggestion:

  1. Use the latest amd GPU driver
  2. Disable CAQ/TAQ in your command.

1920x1080 60fps bitrate = 5000 kbps, keyframe interval 2s, preset = quality, PA checked, max B frames = 2

MaxNumRefFrames=4 MaxConsecutiveBPictures=1 BPicturesPattern=2 AdaptiveMiniGOP=true LowLatencyInternal=true RateControlPreanalysisEnable=1 EnableVBAQ=false HighMotionQualityBoostEnable=true DeBlockingFilter=true BReferenceEnable=true HalfPixel=true QuarterPixel=true PAFrameSadEnable=true PALookAheadBufferDepth=31 PAPerceptualAQMode=1 PASceneChangeDetectionEnable=true

Please check if there is problem.

StevenXiao-AMD avatar Nov 03 '25 19:11 StevenXiao-AMD

Tested (at least to an extent) on minimal setup for AVC/HEVC/AV1

%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AVC%~x1" -codec avc -RateControlMethod vbr -TargetBitrate 5000000 -PeakBitrate 15000000
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_HEVC%~x1" -codec HEVC -HevcRateControlMethod vbr -HevcTargetBitrate 5000000 -HevcPeakBitrate 15000000
%~dp0\TranscodeHW.exe -input %1 -output "%~n1_TranscodeHW_AV1%~x1" -codec av1 -Av1RateControlMethod vbr -Av1TargetBitrate 5000000 -Av1PeakBitrate 15000000

I can definitely see changes (improvements) made in AMF Runtime 1.5.0 (driver 25.10.2). Now it seems like in general bitrate won't exceed target bitrate +10% mark? Prior to that most of them went above 5500 kbps and several went above 5700 and even 6000 kbps.

Ah, no, found singular exception. HEVC and one of my anime samples produced 6183 kbps with 5000 kbps target (AVC and AV1 were both below 5000 kbps). But this was only one such output. Everything else was confined to afformentioned target+10% bitrate (with worst samples besides outsider being 5501 and 5502 kbps respectively)


Here is current data (also for my personal reference in the future): [everything below was encoded with TranscodeHW compiled from AMF 1.4.36 sources, using 5000/15000 kbps avg/peak target with defaults on AVC/HEVC/AV1 codecs]

[Warframe]heavy_particles
AVC - 5201 kbps
HEVC - 5212 kbps
AV1 - 5211 kbps

Anime_easy
AVC - 4751 kbps
HEVC - 6183 kbps
AV1 - 4758 kbps

[heavy]Returnal_1080p
AVC - 5454 kbps
HEVC - 5496 kbps
AV1 - 5331 kbps

[heavy]chaotic
AVC - 5501 kbps
HEVC - 5502 kbps
AV1 - 5437 kbps

[light]HtTYD
AVC - 4754 kbps
HEVC - 4700 kbps
AV1 - 4767 kbps

[movie]Hobbit_fragment
AVC - 5018 kbps
HEVC - 5212 kbps
AV1 - 5169 kbps

[1280x720]ReLive_HEVC_sample
AVC - 5502 kbps
HEVC - 5507 kbps
AV1 - 5448 kbps

[3840x2160]ReLive_HEVC_sample
AVC - 5498 kbps
HEVC - 5499 kbps
AV1 - 5379 kbps

TL:DR - it definitely behaves better now than before. There are still some weird outliers, though. Especially taking in account that worst offender is basically almost if not easiest to compress sample in my collection.

DimkaTsv avatar Nov 03 '25 21:11 DimkaTsv

@DimkaTsv Could you please share this source? We can check what happens in detail.

"HEVC and one of my anime samples produced 6183 kbps with 5000 kbps target (AVC and AV1 were both below 5000 kbps)."

StevenXiao-AMD avatar Nov 04 '25 15:11 StevenXiao-AMD

Sure. There is one nuance with it, though. As it was cropped via ffmpeg, TranscodeHW does not work nicely with first frames when transcoded directly.

For some reason it is how basically every video cropped by ffmpeg behaves with TranscodeHW. To keep proper beginning, raw video track must be encoded (aka extract video track as .h264/.avc). But damaged beginning does not really change overall behaviour in this case, it was more of a note if strict frame to frame comparison will be required.

Source: https://drive.google.com/file/d/1ufQ-rKe-t_NPDP8HmIlufNS_n5o7PaUd/view?usp=drive_link

DimkaTsv avatar Nov 04 '25 21:11 DimkaTsv