radeon-profile icon indicating copy to clipboard operation
radeon-profile copied to clipboard

Fork fixes issue where RP would not launch

Open fischer-felix opened this issue 2 years ago • 25 comments

I was having this issue which prevented the application from starting altogether. Since this seems to have been fixed in this fork, I think it would make sense to merge it upstream.

fischer-felix avatar Jan 15 '22 13:01 fischer-felix

Are you sure this merge would help? I tried it myself, and it doesn't work, failing with the same error message

eitch avatar Jan 16 '22 21:01 eitch

It does the trick for me, but could you provide more details, like driver version, distro, kernel parameters, etc.

https://user-images.githubusercontent.com/65448408/149681851-00068a76-94cd-464d-aee7-825e7021c2af.mp4

fischer-felix avatar Jan 16 '22 23:01 fischer-felix

True, the no file name specified might not be the actual problem. I also get a segmentation fault. My details are (i did also test before with mesa 21.2.x with the same result):

$ uname -a
Linux eitchtower 5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21 SMP Wed Dec 22 15:41:49 U x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Pop
Description:	Pop!_OS 21.10
Release:	21.10
Codename:	impish

$ glxinfo
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon RX 6800 XT (SIENNA_CICHLID, DRM 3.42.0, 5.15.11-76051511-generic, LLVM 13.0.0) (0x73bf)
    Version: 21.3.4
    Accelerated: yes
    Video memory: 16384MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 14582 MB, largest block: 14582 MB
    VBO free aux. memory - total: 15986 MB, largest block: 15986 MB
    Texture free memory - total: 14582 MB, largest block: 14582 MB
    Texture free aux. memory - total: 15986 MB, largest block: 15986 MB
    Renderbuffer free memory - total: 14582 MB, largest block: 14582 MB
    Renderbuffer free aux. memory - total: 15986 MB, largest block: 15986 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 16384 MB
    Total available memory: 32752 MB
    Currently available dedicated video memory: 14582 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 6800 XT (SIENNA_CICHLID, DRM 3.42.0, 5.15.11-76051511-generic, LLVM 13.0.0)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.3.4 - kisak-mesa PPA

eitch avatar Jan 17 '22 12:01 eitch

If my fork fixed your startup issue then it was probably this bug #279 , which is fixed in my branch. I made a pull request (#280) for my fixes as well, but marazmista seems to be inactive at the moment. Feel free to use my branch though, I commited a number of bug fixes in there.

The "QFSFileEngine::open" error message from #278 seems unrelated to any of my fixes. It also is probably not the cause of the segfault, but a call to "QString::toUInt()" as Natim commented in #278. I'll see if I can reproduce that and find the bug.

emerge-e-world avatar Feb 17 '22 14:02 emerge-e-world

@emerge-e-world sadly your fork does not fix the issue... Without sudo your version starts, but with sudo i get the segfault...

eitch avatar Feb 17 '22 20:02 eitch

could you compile it with debug messages enabled and post the output leading up to the segfault? You can do so by running:

qmake-qt5 CONFIG+=debug
make

Then run the resulting ./target/radeon-profile binary from a terminal and copy/paste all the output here.

emerge-e-world avatar Feb 22 '22 16:02 emerge-e-world

$ qmake CONFIG+=debug
$ make clean
$ make -j32
$ sudo ./target/radeon-profile
Creating application object
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
Creating radeon_profile
Loading configuration
Creating ui elements
Analyzing screen  0
  Analyzing output  0
    Analyzing active mode
      Property is EDID, parsing it
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Searching PnP ID:  "GSM"
Found PnP ID:  "GSM" -> "LG Electronics"
  Analyzing output  1
  Analyzing output  2
  Analyzing output  3
Initializing device
Card detected:
 module:  "amdgpu" 
 sysName(path):  "card0" 
 name:  "Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c1)"
hwmon path:  "/sys/class/drm/card0/device/hwmon/hwmon2/"
Opened /dev/dri/renderD128 , fd number: 48
QFSFileEngine::open: No file name specified
Detected power method based on sysfs power_method file: 3
Power method: Power Profile Modes. Creating profiles list
ioctlHandler: everything ok
detected 3 temperature sensors: edge,junction,mem
ASSERT failure in QList<T>::operator[]: "index out of range", file /usr/include/x86_64-linux-gnu/qt5/QtCore/qlist.h, line 579
Aborted

eitch avatar Feb 22 '22 20:02 eitch

Thanks. It looks like there the error happens when parsing one of the pp tables (containing the power level / clock descriptions for your card). I suspect that the amdgpu drivers changed the output for navi 2x cards somewhat, and this may happen with other Navi 2x based cards as well. Could you post the contents of all the files called /sys/class/drm/card0/device/pp*? Just run the following command in a terminal: more /sys/class/drm/card0/device/pp* | cat

emerge-e-world avatar Feb 23 '22 17:02 emerge-e-world

$ more /sys/class/drm/card0/device/pp* | cat
::::::::::::::
/sys/class/drm/card0/device/pp_cur_state
::::::::::::::
0
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_dcefclk
::::::::::::::
0: 417Mhz 
1: 960Mhz *
2: 1200Mhz 
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_dclk
::::::::::::::
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_fclk
::::::::::::::
0: 500Mhz 
1: 1276Mhz *
2: 1941Mhz 
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_mclk
::::::::::::::
0: 96Mhz 
1: 456Mhz 
2: 673Mhz 
3: 1000Mhz *
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_pcie
::::::::::::::
0: 2.5GT/s, x1 310Mhz 
1: 8.0GT/s, x8 619Mhz *
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_sclk
::::::::::::::
0: 500Mhz *
1: 2575Mhz 
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_socclk
::::::::::::::
0: 480Mhz 
1: 800Mhz *
2: 1200Mhz 
::::::::::::::
/sys/class/drm/card0/device/pp_dpm_vclk
::::::::::::::
::::::::::::::
/sys/class/drm/card0/device/pp_features
::::::::::::::
features high: 0x00003763 low: 0xa37ffdff
No. Feature               Bit : State
00. DPM_PREFETCHER       ( 0) : enabled
01. DPM_GFXCLK           ( 1) : enabled
02. DPM_GFX_GPO          ( 2) : enabled
03. DPM_UCLK             ( 3) : enabled
04. DPM_FCLK             ( 4) : enabled
05. DPM_SOCCLK           ( 5) : enabled
06. DPM_MP0CLK           ( 6) : enabled
07. DPM_LINK             ( 7) : enabled
08. DPM_DCEFCLK          ( 8) : enabled
09. DPM_XGMI             ( 9) : disabled
10. MEM_VDDCI_SCALING    (10) : enabled
11. MEM_MVDD_SCALING     (11) : enabled
12. DS_GFXCLK            (12) : enabled
13. DS_SOCCLK            (13) : enabled
14. DS_FCLK              (14) : enabled
15. DS_LCLK              (15) : enabled
16. DS_DCEFCLK           (16) : enabled
17. DS_UCLK              (17) : enabled
18. GFX_ULV              (18) : enabled
19. FW_DSTATE            (19) : enabled
20. GFXOFF               (20) : enabled
21. BACO                 (21) : enabled
22. MM_DPM_PG            (22) : enabled
23. PPT                  (24) : enabled
24. TDC                  (25) : enabled
25. APCC_PLUS            (26) : disabled
26. GTHR                 (27) : disabled
27. ACDC                 (28) : disabled
28. VR0HOT               (29) : enabled
29. VR1HOT               (30) : disabled
30. FW_CTF               (31) : enabled
31. FAN_CONTROL          (32) : enabled
32. THERMAL              (33) : enabled
33. GFX_DCS              (34) : disabled
34. RM                   (35) : disabled
35. LED_DISPLAY          (36) : disabled
36. GFX_SS               (37) : enabled
37. OUT_OF_BAND_MONITOR  (38) : enabled
38. TEMP_DEPENDENT_VMIN  (39) : disabled
39. MMHUB_PG             (40) : enabled
40. ATHUB_PG             (41) : enabled
41. APCC_DFLL            (42) : enabled
42. RSMU_SMN_CG          (44) : enabled
::::::::::::::
/sys/class/drm/card0/device/pp_force_state
::::::::::::::

::::::::::::::
/sys/class/drm/card0/device/pp_mclk_od
::::::::::::::
0
::::::::::::::
/sys/class/drm/card0/device/pp_num_states
::::::::::::::
states: 1
0 default
::::::::::::::
/sys/class/drm/card0/device/pp_od_clk_voltage
::::::::::::::
OD_SCLK:
0: 500Mhz
1: 2384Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz
::::::::::::::
/sys/class/drm/card0/device/pp_power_profile_mode
::::::::::::::
PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS MinFreqType MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
 0 BOOTUP_DEFAULT*:
                    0(       GFXCLK)       0       5       1       0       4     800 4587520  -65536       0
                    1(       SOCCLK)       0       5       1       0       1       0 3276800  -65536   -6553
                    2(        MEMLK)       0       5       1       0       4     800  327680  -65536       0
 1 3D_FULL_SCREEN :
                    0(       GFXCLK)       0       5       1       0       4     650 5242880   -3276       0
                    1(       SOCCLK)       0       5       1       0       1       0  655360  -65536   -6553
                    2(        MEMLK)       0       5       4     850       4     800  327680  -65536       0
 2   POWER_SAVING :
                    0(       GFXCLK)       0       5       1       0       3       0 5898240  -65536       0
                    1(       SOCCLK)       0       5       1       0       1       0 3407872  -65536   -6553
                    2(        MEMLK)       0       5       1       0       3       0 1966080  -65536       0
 3          VIDEO :
                    0(       GFXCLK)       0       5       1       0       4     500 4587520  -65536       0
                    1(       SOCCLK)       0       5       1       0       1       0 3473408  -65536   -6553
                    2(        MEMLK)       0       5       1       0       4     500 1966080  -65536       0
 4             VR :
                    0(       GFXCLK)       0       5       4    1000       1       0 3276800       0       0
                    1(       SOCCLK)       0       5       1       0       1       0  655360  -65536   -6553
                    2(        MEMLK)       0       5       1       0       4     800  327680  -65536       0
 5        COMPUTE :
                    0(       GFXCLK)       0       5       4    1000       1       0 3932160       0       0
                    1(       SOCCLK)       0       5       1       0       1       0  655360  -65536   -6553
                    2(        MEMLK)       0       5       4     850       3       0  327680  -65536  -32768
 6         CUSTOM :
                    0(       GFXCLK)       0       5       1       0       4     800 4587520  -65536       0
                    1(       SOCCLK)       0       5       1       0       1       0 3276800  -65536   -6553
                    2(        MEMLK)       0       5       1       0       4     800  327680  -65536       0
::::::::::::::
/sys/class/drm/card0/device/pp_sclk_od
::::::::::::::
0
::::::::::::::
/sys/class/drm/card0/device/pp_table
::::::::::::::
� "�  �?�v
����x�x���*���a&=k=k���,���
�
�
33�
   �
    dndddddddddd�������2




WZ_Z�������c7-�,7dndssss
���?�7�>��O@�?��|�?���>D4�>C� �=6?��̬?㥛�o��>ʦ̾!I?��?r�z>L���ܺC? ��?)\����>���G F?�jމ?�>�u�>(IW��,<?s  ��?)\����>���G F?�jމ?�>�u�>(IW��,<?s}?�?�$>w�>�wh��<?Uj�t?�Ga>s.e>R���/8?Cj�t?�Ga>s.e>R���/8?C-?���>'��=�(����2?�j�t?�Ga>s.e>R���/8?C�
kx=���a���������,*��&�
����x�x���*�xL��
�
 �
�
�� �]S�o"��=��Y
�              ;i�
 6k7<����������#F��
T�=�ҽ�>���>�?@��!@333@
�#=�Y=���=�I
�TA<33�>ڬ�<���>

eitch avatar Feb 24 '22 08:02 eitch

Thanks for the help! Really happy about getting some feedback, and hoping for the application to soon work again =).

eitch avatar Feb 24 '22 08:02 eitch

You're welcome! I hope I'll be able to make it work for your card, and in extension navi cards overall. It is a bit trickier without the hardware to test - originally I started my fork to collect some general bug fixes and fix things that broke with my radeon VII in particular, but I think I have a good idea what is going wrong.

Anyhow, I've got a preliminary version that should fix the segfault, and work around some things missing that may need some more work. Assuming that particular segfault is the only major problem with your card, it should now start and then be mostly functional.

The part that may need more work is manual voltage/frequency settings in the overclocking tab. It may show nothing or not apply correctly. The slider for automatic percent overclock should likely work though. Anything else I expect to work.

You find those fixes in this branch: https://github.com/emerge-e-world/radeon-profile/tree/navi_fixes

You can clone it with: git clone https://github.com/emerge-e-world/radeon-profile -b navi_fixes

and then compile and run as usual:

qmake CONFIG+=debug
make -j
./target/radeon-profile

In case it still crashes elsewhere, I added some more debug output to pin it down better. So if that happens, posting the output again would help.

emerge-e-world avatar Feb 24 '22 13:02 emerge-e-world

Thanks, so this actually works. I can now set the power cap as well. This is the log during that:

$ sudo ./target/radeon-profile
Please touch the device.
Creating application object
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
Creating radeon_profile
Loading configuration
Creating ui elements
Analyzing screen  0
  Analyzing output  0
    Analyzing active mode
      Property is EDID, parsing it
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Using QCharRef with an index pointing outside the valid range of a QString. The corresponding behavior is deprecated, and will be changed in a future version of Qt.
Searching PnP ID:  "GSM"
Found PnP ID:  "GSM" -> "LG Electronics"
  Analyzing output  1
  Analyzing output  2
  Analyzing output  3
Initializing device
Card detected:
 module:  "amdgpu" 
 sysName(path):  "card0" 
 name:  "Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c1)"
hwmon path:  "/sys/class/drm/card0/device/hwmon/hwmon2/"
Opened /dev/dri/renderD128 , fd number: 48
QFSFileEngine::open: No file name specified
Detected power method based on sysfs power_method file: 3
Power method: Power Profile Modes. Creating profiles list
ioctlHandler: everything ok
detected 3 temperature sensors: edge,junction,mem
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.
ioctlHandler: everything ok
GPU max core clk:  2575 
 max mem clk:  1000 
 vram size:  16304
Handling found device features
Creating power profiles control buttons
Fan control available
setting up OC UI elements
creating OC profile list/graph for:  "default"
UI init complete.
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.
creating OC profile list/graph for:  "default"
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.
parsing OD_SCLK table
parsing OD_MCLK table
parsing OD_RANGE table
parsing OC tables done.

My card is water cooled, so i don't have any fans on them. I'll need to play around with the settings and see if i the overclock sliders work. Thanks anyhow for this start.

eitch avatar Feb 24 '22 15:02 eitch

How can i now enable the daemon, so i don't have to run as root?

eitch avatar Feb 24 '22 15:02 eitch

awesome, that looks very good. The fan control should most likely not be any different with navi, the debug output suggests this is detected correctly. So I'd assume this would work if you had an air cooler attached. We'll have to see if someone with another card can test this.

The daemon is a separate package. I did not make any changes to the daemon for my fork, so you can just use the latest binary package from your distro if it has one (the package should be called "radeon-profile-daemon"), or compile it yourself from marazmista's repo here: https://github.com/marazmista/radeon-profile-daemon

To build follow the instructions from there (it's the same qmake && make steps) and then sudo make install, to install it. After that you can manually start the daemon with: sudo systemctl start radeon-profile-daemon.service and/or enable it at boot: sudo systemctl enable radeon-profile-daemon.service

emerge-e-world avatar Feb 24 '22 17:02 emerge-e-world

the changes to fix the startup segfault for navi2x (RX 6000 series cards) are now merged into the master branch in my repo. So anybody who had the same issue as @eitch, please use the master branch from: https://github.com/emerge-e-world/radeon-profile Any feedback on further issues with navi based cards would be appreciated, since I don't have a card to test myself. Please open bug reports in my repo if you find things not working properly.

emerge-e-world avatar Feb 25 '22 17:02 emerge-e-world

@emerge-e-world right, that worked. I installed it from the repo and am now able to start the client without sudo.

Overclocking doesn't seem to do anything. Looking at the screen:

image

I can't seem to be able to change the clocks manually, and the percent overclock does nothing, my clock always stays at 2475MHz. Any ideas?

eitch avatar Feb 28 '22 07:02 eitch

I think there may be a rather confusing UI bug in the General OC tab. Could you try this: enable both percent overclock and manual frequency control. Then move the core and/or mem clock percent slider up, then push apply at least twice. Now if the frequency values shown in the tables change it may have worked. If so you may try to run some workload and see if the clocks actually go higher. (They may just spike for short bursts – if your temps are getting to high. You can plot and monitor both in the Graphs tab.) If this works then this is just a bug in the UI. That'll have to be fixed in either case :)

Another idea: If you click on the states table tab, is there a "Set Ranges" button at the bottom shown? I expect not, but if there is, can you try to click on it and increase the "maximum core clock" slider there, then hit Save and Apply, and check if your clocks increase.

emerge-e-world avatar Feb 28 '22 12:02 emerge-e-world

The two times clicking on the apply does add set second checkbox in the left list from 0 Mhz to 2575Mhz, but while running vkmark nothing happens. And this is not a thermal issue, as you can see on the temps.

image

image

eitch avatar Mar 01 '22 07:03 eitch

I had a look at the amdgpu kernel driver doc to find any changes in the clock/voltage control for navi2x to figure out what needs changing. It seems – except for a new voltage offset feature – it should work the same as with navi1x and vega20 cards. And therefore in principle work with the existing code.

However, your pp_od_clk_voltage table is missing the section called OD_VDDC_CURVE, describing the frequency/voltage curve. This is what made me originally think there would need to be some bigger changes to make it work. But I now rather think your amdgpu driver is just not exposing it.

Maybe it is as simple as it being disabled. Did you boot with a amdgpu.ppfeaturemask kernel parameter? It is a parameter for the amdgpu driver to unlock some by default disabled driver features, including some overclocking bits.

If you didn't, could you try the following?

Add the parameter amdgpu.ppfeaturemask=0xffffffff to your kernel parameter list in your bootloader. You seem to run PopOS, on that distro you should be able to add kernel parameters like this:

sudo kernelstub -a amdgpu.ppfeaturemask=0xffffffff

then reboot. After you've rebooted, you can verify that it was actually applied with:

cat /proc/cmdline

The "amdgpu.ppfeaturemask=0xffffffff" string should be somewhere in there.

If that worked, please post the output of:

cat /sys/class/drm/card0/device/pp_od_clk_voltage

again. If there is now an OD_VDDC_CURVE section in there, you can try again applying an overclock. There should then also be the "set ranges" button now available under the states table tab, where you can increase the max allowed clock speed.

emerge-e-world avatar Mar 02 '22 03:03 emerge-e-world

Sadly this is not the case, as i have had this feature activated quite a while already:

eitch@eitchtower:~$ cat /proc/cmdline
initrd=\EFI\Pop_OS-c20a2fe8-0d53-4c86-bfcd-b373b2b6bceb\initrd.img root=UUID=c20a2fe8-0d53-4c86-bfcd-b373b2b6bceb ro quiet loglevel=0 systemd.show_status=false splash amdgpu.ppfeaturemask=0xffffffff
eitch@eitchtower:~$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2374Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz

eitch avatar Mar 02 '22 13:03 eitch

OK, well that would have been too easy. I'll see if I can work on getting the OD_RANGES settings to work (the set ranges button), since the values are there. Maybe not setting those caps the max allowed clocks. As well as exposing the new VDDGFX_OFFSET to the UI. It may take a couple of days since I can get to it though.

In the mean time, could you check of the amdgpu drivers reports any errors when you try to apply OC settings? This should get you any kernel messages from the driver.

sudo dmesg | grep amdpgu

emerge-e-world avatar Mar 06 '22 03:03 emerge-e-world

Sorry for the delay, i've been on holidays. Here is the listing:

[    0.000000] Command line: initrd=\EFI\Pop_OS-c20a2fe8\initrd.img root=UUID=c20a2fe8 ro quiet loglevel=0 systemd.show_status=false splash amdgpu.ppfeaturemask=0xffffffff
[    0.057528] Kernel command line: initrd=\EFI\Pop_OS\initrd.img root=UUID=c20a2fe8 ro quiet loglevel=0 systemd.show_status=false splash amdgpu.ppfeaturemask=0xffffffff
[    2.080809] [drm] amdgpu kernel modesetting enabled.
[    2.085476] amdgpu: Ignoring ACPI CRAT on non-APU system
[    2.085478] amdgpu: Virtual CRAT table created for CPU
[    2.085482] amdgpu: Topology: Add CPU node
[    2.085527] fb0: switching to amdgpu from EFI VGA
[    2.085560] amdgpu 0000:0e:00.0: vgaarb: deactivate vga console
[    2.085587] amdgpu 0000:0e:00.0: enabling device (0006 -> 0007)
[    2.085615] amdgpu 0000:0e:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    2.087256] amdgpu 0000:0e:00.0: amdgpu: Fetched VBIOS from VFCT
[    2.087257] amdgpu: ATOM BIOS: 113-D41201-XTC
[    2.087297] amdgpu 0000:0e:00.0: amdgpu: MEM ECC is not presented.
[    2.087298] amdgpu 0000:0e:00.0: amdgpu: SRAM ECC is not presented.
[    2.087305] amdgpu 0000:0e:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    2.087306] amdgpu 0000:0e:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    2.087307] amdgpu 0000:0e:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    2.087340] [drm] amdgpu: 16368M of VRAM memory ready
[    2.087340] [drm] amdgpu: 16368M of GTT memory ready.
[    2.087605] amdgpu 0000:0e:00.0: amdgpu: PSP runtime database doesn't exist
[    4.341650] amdgpu 0000:0e:00.0: amdgpu: Will use PSP to load VCN firmware
[    4.556253] amdgpu 0000:0e:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    4.556282] amdgpu 0000:0e:00.0: amdgpu: use vbios provided pptable
[    4.626347] amdgpu 0000:0e:00.0: amdgpu: SMU is initialized successfully!
[    5.097568] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    5.138720] amdgpu: HMM registered 16368MB device memory
[    5.138777] amdgpu: SRAT table not found
[    5.138777] amdgpu: Virtual CRAT table created for GPU
[    5.139198] amdgpu: Topology: Add dGPU node [0x73bf:0x1002]
[    5.139201] kfd kfd: amdgpu: added device 1002:73bf
[    5.139227] amdgpu 0000:0e:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, active_cu_number 72
[    5.143449] fbcon: amdgpudrmfb (fb0) is primary device
[    5.143451] amdgpu 0000:0e:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    5.156789] amdgpu 0000:0e:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    5.156791] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    5.156792] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    5.156792] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    5.156793] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    5.156794] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    5.156795] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    5.156795] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    5.156796] amdgpu 0000:0e:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    5.156797] amdgpu 0000:0e:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    5.156798] amdgpu 0000:0e:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    5.156798] amdgpu 0000:0e:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    5.156799] amdgpu 0000:0e:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[    5.156800] amdgpu 0000:0e:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[    5.156801] amdgpu 0000:0e:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[    5.156801] amdgpu 0000:0e:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[    5.156802] amdgpu 0000:0e:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[    5.156803] amdgpu 0000:0e:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[    5.156804] amdgpu 0000:0e:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[    5.156805] amdgpu 0000:0e:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[    5.156805] amdgpu 0000:0e:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[    5.157958] [drm] Initialized amdgpu 3.44.0 20150101 for 0000:0e:00.0 on minor 0
[    7.123169] snd_hda_intel 0000:0e:00.1: bound 0000:0e:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

eitch avatar Mar 28 '22 12:03 eitch

It doesn't matter which combinations of buttons i press, the logs don't change. I'm on kernel 5.16.12

eitch avatar Mar 28 '22 12:03 eitch

Does this also help you:

$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2374Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz

eitch avatar Mar 30 '22 12:03 eitch

Tried it out on RX580 and I don't think that frequency curves are getting applied. Only "frequency control" toggles seems to work for sure (because it activates 1000 MHz intermediate memory state which is disabled by default in drivers due to causing flickering). "Percent overlock" might work but "GPU max core clk: / max mem clk: " startup message always shows default ones UNLESS radeon-profile binary has suid applied to it with chmod u+s (even starting with sudo doesn't change that). BUT that only works if it is patched with this to prevent Qt's "safety suicide":

diff --git a/radeon-profile/main.cpp b/radeon-profile/main.cpp
index e3f6532..3d99571 100644
--- a/radeon-profile/main.cpp
+++ b/radeon-profile/main.cpp
@@ -6,6 +6,7 @@ int main(int argc, char *argv[])
 {
     qDebug() << "Creating application object";
 
+    QApplication::setSetuidAllowed(true);
     QApplication a(argc, argv);
     QTranslator translator;
     QLocale locale;

v-fox avatar Nov 02 '22 10:11 v-fox