Does latest Driver branch support and recognize GFX90A?
rock-dkms from ROCm 4.3.1-release fails to recognize GFX90A GPUs. How about the latest branch status?
Can you provide a full dmesg? GFX90A should be recognized in 4.3.1 from the ROCK side.
$ lspci | grep Displ
0e:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 740c (rev 01)
11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 740c (rev 01)
16:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 740c (rev 01)
19:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 740c (rev 01)
...
...
$ dmesg | grep AMD
[ 0.000000] AMD AuthenticAMD
[ 0.021179] RAMDISK: [mem 0x387df000-0x3d25efff]
[ 0.021189] ACPI: RSDP 0x00000000A6EB4014 000024 (v02 AMD )
[ 0.021193] ACPI: XSDT 0x00000000A6EB3728 0000FC (v01 AMD ETHANOLX 03242016 AMI 01000013)
[ 0.021199] ACPI: FACP 0x00000000A6A7A000 000114 (v06 AMD ETHANOLX 03242016 AMI 00010013)
[ 0.021204] ACPI: DSDT 0x00000000A6A69000 010EC6 (v02 AMD ETHANOLX 03242016 INTL 20120913)
[ 0.021210] ACPI: SSDT 0x00000000A6A7C000 00094E (v02 AMD AmdTable 00000002 MSFT 02000002)
[ 0.021213] ACPI: SPMI 0x00000000A6A7B000 000041 (v05 AMD ETHANOLX 00000000 AMI. 00000000)
[ 0.021216] ACPI: FPDT 0x00000000A6A68000 000044 (v01 AMD ETHANOLX 03242016 AMI 00010013)
[ 0.021219] ACPI: FIDT 0x00000000A6A67000 00009C (v01 AMD ETHANOLX 03242016 AMI 00010013)
[ 0.021222] ACPI: MCFG 0x00000000A6A66000 00003C (v01 AMD ETHANOLX 03242016 MSFT 00010013)
[ 0.021225] ACPI: SSDT 0x00000000A6A65000 000EAC (v02 AMD CPUSSDT 03242016 AMI 03242016)
[ 0.021228] ACPI: SSDT 0x00000000A6A64000 000110 (v01 AMD CPMRAS 00000001 INTL 20120913)
[ 0.021231] ACPI: BERT 0x00000000A6A63000 000030 (v01 AMD AMD BERT 00000001 AMD 00000001)
[ 0.021234] ACPI: EINJ 0x00000000A6A61000 000150 (v01 AMD AMD EINJ 00000001 AMD 00000001)
[ 0.021237] ACPI: HPET 0x00000000A6A60000 000038 (v01 AMD ETHANOLX 03242016 AMI 00000005)
[ 0.021240] ACPI: UEFI 0x00000000A6EA5000 000042 (v01 AMD ETHANOLX 01072009 AMI 01000013)
[ 0.021246] ACPI: TPM2 0x00000000A6A5E000 000034 (v04 AMD ETHANOLX 00000001 AMI 00000000)
[ 0.021249] ACPI: IVRS 0x00000000A6A5D000 000370 (v02 AMD AmdTable 00000001 AMD 00000000)
[ 0.021252] ACPI: PCCT 0x00000000A6A5C000 00006E (v02 AMD AmdTable 00000001 AMD 00000000)
[ 0.021254] ACPI: SSDT 0x00000000A6A42000 019DA4 (v01 AMD AmdTable 00000001 AMD 00000001)
[ 0.021257] ACPI: SRAT 0x00000000A6A41000 0008F8 (v03 AMD AmdTable 00000001 AMD 00000001)
[ 0.021260] ACPI: MSCT 0x00000000A6A40000 00004E (v01 AMD AmdTable 00000000 AMD 00000001)
[ 0.021263] ACPI: SLIT 0x00000000A6A3F000 00003C (v01 AMD AmdTable 00000001 AMD 00000001)
[ 0.021266] ACPI: CRAT 0x00000000A6A30000 00E948 (v01 AMD AmdTable 00000001 AMD 00000001)
[ 0.021269] ACPI: CDIT 0x00000000A6A2F000 000038 (v01 AMD AmdTable 00000001 AMD 00000001)
[ 0.021272] ACPI: SSDT 0x00000000A6A2D000 0017DC (v01 AMD CPMCMN 00000001 INTL 20120913)
[ 0.021275] ACPI: WSMT 0x00000000A6A2C000 000028 (v01 AMD ETHANOLX 03242016 AMI 00010013)
[ 0.021278] ACPI: APIC 0x00000000A6A2B000 0008B2 (v04 AMD ETHANOLX 03242016 AMI 00010013)
[ 0.021280] ACPI: HEST 0x00000000A69BA000 070A74 (v01 AMD AMD HEST 00000001 AMD 00000001)
[ 7.086273] Spectre V2 : Mitigation: Full AMD retpoline
[ 7.197102] smpboot: CPU0: AMD EPYC 7V12 64-Core Processor (family: 0x17, model: 0x31, stepping: 0x0)
[ 7.197373] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
[ 8.367422] pci 0000:6f:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367469] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367508] pci 0000:2f:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367534] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367584] pci 0000:ef:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367619] pci 0000:c1:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367670] pci 0000:b0:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.367713] pci 0000:81:00.2: AMD-Vi: IOMMU performance counters supported
[ 8.537476] pci 0000:6f:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537484] pci 0000:6f:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537490] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537491] pci 0000:40:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537495] pci 0000:2f:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537496] pci 0000:2f:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537500] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537501] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537505] pci 0000:ef:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537506] pci 0000:ef:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537510] pci 0000:c1:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537510] pci 0000:c1:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537514] pci 0000:b0:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537515] pci 0000:b0:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537519] pci 0000:81:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 8.537519] pci 0000:81:00.2: AMD-Vi: Extended features (0x58f77ef22294ade):
[ 8.537523] AMD-Vi: Interrupt remapping enabled
[ 8.537523] AMD-Vi: Virtual APIC enabled
[ 8.537524] AMD-Vi: X2APIC enabled
[ 8.538325] AMD-Vi: Lazy IO/TLB flushing enabled
[ 8.544101] perf: AMD IBS detected (0x000003ff)
[ 8.544120] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 8.544138] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 8.544157] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[ 8.544175] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
[ 8.544198] perf/amd_iommu: Detected AMD IOMMU #4 (2 banks, 4 counters/bank).
[ 8.544217] perf/amd_iommu: Detected AMD IOMMU #5 (2 banks, 4 counters/bank).
[ 8.544238] perf/amd_iommu: Detected AMD IOMMU #6 (2 banks, 4 counters/bank).
[ 8.544258] perf/amd_iommu: Detected AMD IOMMU #7 (2 banks, 4 counters/bank).
[ 10.124966] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
$ ls /dev/kfd
/dev/kfd
$ ls /dev/dri/
ls: cannot access '/dev/dri/': No such file or directory
$ dpkg -l | grep rock
ii rock-dkms 1:4.3-59 all amdgpu driver in DKMS format.
ii rock-dkms-firmware 1:4.3-59 all firmware blobs used by amdgpu driver in DKMS format
$ uname -a
Linux mi200ev2-linux 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Can we get a full dmesg, without the grep? (If there's something sensitive, feel free to cut that stuff out, I want to see everything from drm, pci, amd, amdgpu, amdkfd and kfd, so the grep doesn't help much there) Note that DID 0x740C is supported in 4.3.1: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/rocm-4.3.1/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1223 So let's try to track this down, to get you up and running
Alright, so it looks like amdgpu never even tries to load. Let's try a couple things:
- What does "dkms status" return? If it returns "installed" as the status, try "sudo modprobe amdgpu" and see if it comes up. If it only returns "added" then there was an installation failure, and you can usually find the log at /var/lib/dkms/amdgpu-$VER/build/make.log (where $VER is the version of rock-dkms that you installed).
Note that this path would also be printed out during the installation of rock-dkms saying "Errors occurred, consult /var/lib/..... for more information" or something to that effect. Hopefully it's just a little compilation error and we can address it
$ dpkg -l | grep rock
ii rock-dkms 1:4.3-59 all amdgpu driver in DKMS format.
ii rock-dkms-firmware 1:4.3-59 all firmware blobs used by amdgpu driver in DKMS format
$ dkms status
$ modprobe amdgpu
$ lsmod | grep amdgpu
amdgpu 6053888 0
iommu_v2 24576 1 amdgpu
gpu_sched 40960 1 amdgpu
drm_ttm_helper 16384 2 drm_vram_helper,amdgpu
ttm 73728 3 drm_vram_helper,amdgpu,drm_ttm_helper
drm_kms_helper 237568 5 drm_vram_helper,ast,amdgpu
i2c_algo_bit 16384 2 ast,amdgpu
drm 548864 9 gpu_sched,drm_kms_helper,drm_vram_helper,ast,amdgpu,drm_ttm_helper,ttm
$ ls /var/lib/dkms/amdgpu-*
ls: cannot access '/var/lib/dkms/amdgpu-*': No such file or directory
Definitely baffled here, since dkms doesn't look like it's even done anything. Normally dkms gets pulled in, so it should provide something. Maybe we can get things working. Is the code in /usr/src/amdgpu-4.3-59 ? dpkg showing that it installed implies that it should be. If so, you can try to get it building via: sudo dkms add amdgpu/4.3-59 sudo dkms build amdgpu/4.3-59 -k $(uname -r)/x86_64 sudo dkms install amdgpu/4.3-59 -k $(uname -r)/x86_64
Let me know how it goes!
$ dkms build amdgpu/4.3-59 -k $(uname -r)/x86_64
Kernel preparation unnecessary for this kernel. Skipping...
Running the pre_build script:
checking for a BSD-compatible install... /bin/install -c
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking kernel source directory... /usr/src/linux-headers-5.11.0-27-generic
checking kernel build directory... /usr/src/linux-headers-5.11.0-27-generic
checking kernel source version... 5.11.0-27-generic
checking kernel file name for module symbols... Module.symvers
checking for linux/overflow.h... yes
checking for linux/sched/mm.h... yes
checking for linux/sched/task.h... yes
checking for linux/sched/signal.h... yes
checking for linux/nospec.h... yes
checking for linux/bits.h... yes
checking for linux/io-64-nonatomic-lo-hi.h... yes
checking for asm/set_memory.h... yes
checking for asm/fpu/api.h... yes
checking for uapi/linux/sched/types.h... yes
checking for linux/compiler_attributes.h... yes
checking for linux/dma-fence.h... yes
checking for linux/dma-resv.h... yes
checking for linux/mmap_lock.h... yes
checking for linux/pci-p2pdma.h... yes
checking for linux/dma-attrs.h... no
checking for linux/mem_encrypt.h... yes
checking for linux/dma-buf-map.h... yes
checking for drm/drm_backport.h... no
checking for drm/amdgpu_pciid.h... no
checking for drm/drm_auth.h... yes
checking for drm/drm_irq.h... yes
checking for drm/drm_connector.h... yes
checking for drm/drm_encoder.h... yes
checking for drm/drm_plane.h... yes
checking for drm/drm_print.h... yes
checking for drm/drm_drv.h... yes
checking for drm/drm_file.h... yes
checking for drm/drm_debugfs.h... yes
checking for drm/drm_ioctl.h... yes
checking for drm/drm_vblank.h... yes
checking for drm/drm_device.h... yes
checking for drm/drm_gem_framebuffer_helper.h... yes
checking for drm/drm_hdcp.h... yes
checking for drm/drm_audio_component.h... yes
checking for drm/drm_util.h... yes
checking for drm/drm_atomic_uapi.h... yes
checking for drm/drm_probe_helper.h... yes
checking for drm/drmP.h... no
checking for drm/task_barrier.h... yes
checking for drm/drm_managed.h... yes
checking for drm/drm_gem_ttm_helper.h... yes
checking for module configuration... done
configure: creating ./config.status
config.status: creating config/config.h
Building module:
cleaning build area...(bad exit status: 2)
make -j128 KERNELRELEASE=5.11.0-27-generic -j128 TTM_NAME=amdttm SCHED_NAME=amd-sched -C /lib/modules/5.11.0-27-generic/build M=/var/lib/dkms/amdgpu/4.3-59/build.....
Signing module:
- /var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/module/amdgpu.ko
- /var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/module/amd-sched.ko
- /var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/module/amdttm.ko
- /var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/module/amdkcl.ko
Secure Boot not enabled on this system.
cleaning build area...(bad exit status: 2)
DKMS: build completed.
$ dkms install amdgpu/4.3-59 -k $(uname -r)/x86_64
Forcing installation of amdgpu
amdgpu.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.11.0-27-generic/updates/dkms/
amdttm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.11.0-27-generic/updates/dkms/
amdkcl.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.11.0-27-generic/updates/dkms/
amd-sched.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.11.0-27-generic/updates/dkms/
Running the post_install script:
depmod...
Backing up initrd.img-5.11.0-27-generic to /boot/initrd.img-5.11.0-27-generic.old-dkms
Making new initrd.img-5.11.0-27-generic
(If next boot fails, revert to initrd.img-5.11.0-27-generic.old-dkms image)
update-initramfs........
DKMS: install completed.
$ dkms status
amdgpu, 4.3-59, 5.11.0-27-generic, x86_64: installed
$modprobe amdgpu
$ dmesg
..
[25646.268125] pcieport 0000:98:00.0: bridge window [mem 0x700d0000000-0x700efffffff 64bit pref]
[25646.268130] pcieport 0000:99:00.0: PCI bridge to [bus 9a]
[25646.268135] pcieport 0000:99:00.0: bridge window [mem 0xb1000000-0xb10fffff]
[25646.268138] pcieport 0000:99:00.0: bridge window [mem 0x700d0000000-0x700efffffff 64bit pref]
[25646.268151] [drm] Not enough PCI address space for a large BAR.
[25646.268152] amdgpu 0000:9a:00.0: BAR 0: assigned [mem 0x700d0000000-0x700dfffffff 64bit pref]
[25646.268162] amdgpu 0000:9a:00.0: BAR 2: assigned [mem 0x700e0000000-0x700e01fffff 64bit pref]
[25646.268184] amdgpu 0000:9a:00.0: amdgpu: VRAM: 65520M 0x0000024000000000 - 0x0000024FFEFFFFFF (65520M used)
[25646.268186] amdgpu 0000:9a:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[25646.268187] amdgpu 0000:9a:00.0: amdgpu: AGP: 265289728M 0x0000030000000000 - 0x0000FFFFFFFFFFFF
[25646.268195] [drm] Detected VRAM RAM=65520M, BAR=256M
[25646.268196] [drm] RAM width 4096bits HBM
[25646.268216] [drm] amdgpu: 65520M of VRAM memory ready
[25646.268218] [drm] amdgpu: 2064175M of GTT memory ready.
[25646.268220] [drm] GART: num cpu pages 131072, num gpu pages 131072
[25646.268345] [drm] PCIE GART of 512M enabled.
[25646.268346] [drm] PTB located at 0x0000024000000000
[25646.269793] [drm] Found VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 21
[25646.269799] [drm] PSP loading VCN firmware
[25646.594906] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load sos failed!
[25646.596219] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[25646.597416] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[25646.598591] amdgpu 0000:9a:00.0: amdgpu: amdgpu_device_ip_init failed
[25646.599539] amdgpu 0000:9a:00.0: amdgpu: Fatal error during GPU init
[25646.600522] amdgpu: probe of 0000:9a:00.0 failed with error -22
$ ls /var/lib/dkms/amdgpu*
4.3-59 kernel-5.11.0-27-generic-x86_64
$ ls /var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/log/make.log
/var/lib/dkms/amdgpu/4.3-59/5.11.0-27-generic/x86_64/log/make.log
@kentrussell Is it related to improper BIOS settings?
It definitely appears to be possible to be the BIOS. This run through, the PSP didn't load correctly but at least it tried. For the most part, this error is usually addressed with newer firmware (though 4.3.1 has the latest firmware for GFX90A, so this wouldn't be the fix here), or SBIOS/VBIOS updates.
I'd start with updating the SBIOS, ensuring that "Above 4G decoding" is enabled, and ensuring that you've got the latest base-kernel (5.8 HWE) installed, which I believe is 5.8.0-65.73 , instead of the 5.8.0-43 that you have installed there. Good luck! I'll do some more digging on this side as well to see if we have some more things to try to pursue. Out of curiousity, do you have any older-generation GPUs around that you can swap in, just to see if the HW is set up correctly? Swapping the card for something like a Vega20 or FIji or anything newer than Hawaii should just be "plug-and-play", so if you drop it in the system, it should just work. If it works with an older GPU, then it could be a HW issue with that card, or it could be that some support is still missing for that card.
Theoretically the kernel should support it, even though it's not on the officially-supported-hardware list. That's why I want to keep working through this, even though the Official ROCm documentation doesn't list support for it yet (should be in ROCm 4.5 officially, IIRC)
EDIT: Just to cover all of our bases, let's make sure that the FW is actually installed correctly (since PSP is the first FW block to load). Is there a folder called /lib/firmware/updates/amdgpu on your system? And if you do a lsinitramfs on the booted ramfs image, is the firmware located in lib/firmware/updates/amdgpu ?
$ ls /lib/firmware/updates/amdgpu
aldebaran_mec2.bin carrizo_mec2.bin hainan_mc.bin navi10_asd.bin navi14_sos.bin polaris10_k2_smc.bin polaris12_mc.bin renoir_mec.bin tonga_sdma.bin vega12_mec2.bin
aldebaran_mec.bin carrizo_mec.bin hainan_me.bin navi10_ce.bin navi14_ta.bin polaris10_k_mc.bin polaris12_me_2.bin renoir_pfp.bin tonga_smc.bin vega12_mec.bin
aldebaran_rlc.bin carrizo_pfp.bin hainan_pfp.bin navi10_gpu_info.bin navi14_vcn.bin polaris10_k_smc.bin polaris12_me.bin renoir_rlc.bin tonga_uvd.bin vega12_pfp.bin
aldebaran_sdma.bin carrizo_rlc.bin hainan_rlc.bin navi10_me.bin navy_flounder_ce.bin polaris10_mc.bin polaris12_mec2_2.bin renoir_sdma.bin tonga_vce.bin vega12_rlc.bin
aldebaran_smc.bin carrizo_sdma1.bin hainan_smc.bin navi10_mec2.bin navy_flounder_dmcub.bin polaris10_me_2.bin polaris12_mec_2.bin renoir_ta.bin topaz_ce.bin vega12_sdma1.bin
aldebaran_sos.bin carrizo_sdma.bin hawaii_ce.bin navi10_mec.bin navy_flounder_me.bin polaris10_me.bin polaris12_mec2.bin renoir_vcn.bin topaz_k_smc.bin vega12_sdma.bin
aldebaran_ta.bin carrizo_uvd.bin hawaii_k_smc.bin navi10_pfp.bin navy_flounder_mec2.bin polaris10_mec2_2.bin polaris12_mec.bin si58_mc.bin topaz_mc.bin vega12_smc.bin
aldebaran_vcn.bin carrizo_vce.bin hawaii_mc.bin navi10_rlc.bin navy_flounder_mec.bin polaris10_mec_2.bin polaris12_pfp_2.bin sienna_cichlid_ce.bin topaz_me.bin vega12_sos.bin
arcturus_asd.bin dimgrey_cavefish_ce.bin hawaii_me.bin navi10_sdma1.bin navy_flounder_pfp.bin polaris10_mec2.bin polaris12_pfp.bin sienna_cichlid_dmcub.bin topaz_mec2.bin vega12_uvd.bin
arcturus_gpu_info.bin dimgrey_cavefish_dmcub.bin hawaii_mec.bin navi10_sdma.bin navy_flounder_rlc.bin polaris10_mec.bin polaris12_rlc.bin sienna_cichlid_me.bin topaz_mec.bin vega12_vce.bin
arcturus_mec2.bin dimgrey_cavefish_me.bin hawaii_pfp.bin navi10_smc.bin navy_flounder_sdma.bin polaris10_pfp_2.bin polaris12_sdma1.bin sienna_cichlid_mec2.bin topaz_pfp.bin vega20_asd.bin
arcturus_mec.bin dimgrey_cavefish_mec2.bin hawaii_rlc.bin navi10_sos.bin navy_flounder_smc.bin polaris10_pfp.bin polaris12_sdma.bin sienna_cichlid_mec.bin topaz_rlc.bin vega20_ce.bin
arcturus_rlc.bin dimgrey_cavefish_mec.bin hawaii_sdma1.bin navi10_ta.bin navy_flounder_sos.bin polaris10_rlc.bin polaris12_smc.bin sienna_cichlid_mes.bin topaz_sdma1.bin vega20_me.bin
arcturus_sdma.bin dimgrey_cavefish_pfp.bin hawaii_sdma.bin navi10_vcn.bin navy_flounder_ta.bin polaris10_sdma1.bin polaris12_uvd.bin sienna_cichlid_pfp.bin topaz_sdma.bin vega20_mec2.bin
arcturus_smc.bin dimgrey_cavefish_rlc.bin hawaii_smc.bin navi12_asd.bin navy_flounder_vcn.bin polaris10_sdma.bin polaris12_vce.bin sienna_cichlid_rlc.bin topaz_smc.bin vega20_mec.bin
arcturus_sos.bin dimgrey_cavefish_sdma.bin hawaii_uvd.bin navi12_ce.bin oland_ce.bin polaris10_smc.bin raven2_asd.bin sienna_cichlid_sdma.bin vangogh_asd.bin vega20_pfp.bin
arcturus_ta.bin dimgrey_cavefish_smc.bin hawaii_vce.bin navi12_dmcu.bin oland_k_smc.bin polaris10_smc_sk.bin raven2_ce.bin sienna_cichlid_smc.bin vangogh_ce.bin vega20_rlc.bin
arcturus_vcn.bin dimgrey_cavefish_sos.bin kabini_ce.bin navi12_gpu_info.bin oland_mc.bin polaris10_uvd.bin raven2_gpu_info.bin sienna_cichlid_sos.bin vangogh_dmcub.bin vega20_sdma1.bin
banks_k_2_smc.bin dimgrey_cavefish_ta.bin kabini_me.bin navi12_me.bin oland_me.bin polaris10_vce.bin raven2_me.bin sienna_cichlid_ta.bin vangogh_me.bin vega20_sdma.bin
beige_goby_ce.bin dimgrey_cavefish_vcn.bin kabini_mec.bin navi12_mec2.bin oland_pfp.bin polaris11_ce_2.bin raven2_mec2.bin sienna_cichlid_vcn.bin vangogh_mec2.bin vega20_smc.bin
beige_goby_dmcub.bin fiji_ce.bin kabini_pfp.bin navi12_mec.bin oland_rlc.bin polaris11_ce.bin raven2_mec.bin stoney_ce.bin vangogh_mec.bin vega20_sos.bin
beige_goby_me.bin fiji_mc.bin kabini_rlc.bin navi12_pfp.bin oland_smc.bin polaris11_k2_smc.bin raven2_pfp.bin stoney_me.bin vangogh_pfp.bin vega20_ta.bin
beige_goby_mec2.bin fiji_me.bin kabini_sdma1.bin navi12_rlc.bin oland_uvd.bin polaris11_k_mc.bin raven2_rlc.bin stoney_mec.bin vangogh_rlc.bin vega20_uvd.bin
beige_goby_mec.bin fiji_mec2.bin kabini_sdma.bin navi12_sdma1.bin picasso_asd.bin polaris11_k_smc.bin raven2_sdma.bin stoney_pfp.bin vangogh_sdma.bin vega20_vce.bin
beige_goby_pfp.bin fiji_mec.bin kabini_uvd.bin navi12_sdma.bin picasso_ce.bin polaris11_mc.bin raven2_ta.bin stoney_rlc.bin vangogh_toc.bin vegam_ce.bin
beige_goby_rlc.bin fiji_pfp.bin kabini_vce.bin navi12_smc.bin picasso_gpu_info.bin polaris11_me_2.bin raven2_vcn.bin stoney_sdma.bin vangogh_vcn.bin vegam_me.bin
beige_goby_sdma.bin fiji_rlc.bin kaveri_ce.bin navi12_sos.bin picasso_me.bin polaris11_me.bin raven_asd.bin stoney_uvd.bin vega10_acg_smc.bin vegam_mec2.bin
beige_goby_smc.bin fiji_sdma1.bin kaveri_me.bin navi12_ta.bin picasso_mec2.bin polaris11_mec2_2.bin raven_ce.bin stoney_vce.bin vega10_asd.bin vegam_mec.bin
beige_goby_sos.bin fiji_sdma.bin kaveri_mec2.bin navi12_vcn.bin picasso_mec.bin polaris11_mec_2.bin raven_dmcu.bin tahiti_ce.bin vega10_ce.bin vegam_pfp.bin
beige_goby_ta.bin fiji_smc.bin kaveri_mec.bin navi14_asd.bin picasso_pfp.bin polaris11_mec2.bin raven_gpu_info.bin tahiti_k_smc.bin vega10_gpu_info.bin vegam_rlc.bin
beige_goby_vcn.bin fiji_uvd.bin kaveri_pfp.bin navi14_ce.bin picasso_rlc_am4.bin polaris11_mec.bin raven_kicker_rlc.bin tahiti_mc.bin vega10_me.bin vegam_sdma1.bin
bonaire_ce.bin fiji_vce.bin kaveri_rlc.bin navi14_ce_wks.bin picasso_rlc.bin polaris11_pfp_2.bin raven_me.bin tahiti_me.bin vega10_mec2.bin vegam_sdma.bin
bonaire_k_smc.bin green_sardine_asd.bin kaveri_sdma1.bin navi14_gpu_info.bin picasso_sdma.bin polaris11_pfp.bin raven_mec2.bin tahiti_pfp.bin vega10_mec.bin vegam_smc.bin
bonaire_mc.bin green_sardine_ce.bin kaveri_sdma.bin navi14_me.bin picasso_ta.bin polaris11_rlc.bin raven_mec.bin tahiti_rlc.bin vega10_pfp.bin vegam_uvd.bin
bonaire_me.bin green_sardine_dmcub.bin kaveri_uvd.bin navi14_mec2.bin picasso_vcn.bin polaris11_sdma1.bin raven_pfp.bin tahiti_smc.bin vega10_rlc.bin vegam_vce.bin
bonaire_mec.bin green_sardine_me.bin kaveri_vce.bin navi14_mec2_wks.bin pitcairn_ce.bin polaris11_sdma.bin raven_rlc.bin tahiti_uvd.bin vega10_sdma1.bin verde_ce.bin
bonaire_pfp.bin green_sardine_mec2.bin mullins_ce.bin navi14_mec.bin pitcairn_k_smc.bin polaris11_smc.bin raven_sdma.bin tonga_ce.bin vega10_sdma.bin verde_k_smc.bin
bonaire_rlc.bin green_sardine_mec.bin mullins_me.bin navi14_mec_wks.bin pitcairn_mc.bin polaris11_smc_sk.bin raven_ta.bin tonga_k_smc.bin vega10_smc.bin verde_mc.bin
bonaire_sdma1.bin green_sardine_pfp.bin mullins_mec.bin navi14_me_wks.bin pitcairn_me.bin polaris11_uvd.bin raven_vcn.bin tonga_mc.bin vega10_sos.bin verde_me.bin
bonaire_sdma.bin green_sardine_rlc.bin mullins_pfp.bin navi14_pfp.bin pitcairn_pfp.bin polaris11_vce.bin renoir_asd.bin tonga_me.bin vega10_uvd.bin verde_pfp.bin
bonaire_smc.bin green_sardine_sdma.bin mullins_rlc.bin navi14_pfp_wks.bin pitcairn_rlc.bin polaris12_32_mc.bin renoir_ce.bin tonga_mec2.bin vega10_vce.bin verde_rlc.bin
bonaire_uvd.bin green_sardine_ta.bin mullins_sdma1.bin navi14_rlc.bin pitcairn_smc.bin polaris12_ce_2.bin renoir_dmcub.bin tonga_mec.bin vega12_asd.bin verde_smc.bin
bonaire_vce.bin green_sardine_vcn.bin mullins_sdma.bin navi14_sdma1.bin pitcairn_uvd.bin polaris12_ce.bin renoir_gpu_info.bin tonga_pfp.bin vega12_ce.bin verde_uvd.bin
carrizo_ce.bin hainan_ce.bin mullins_uvd.bin navi14_sdma.bin polaris10_ce_2.bin polaris12_k_mc.bin renoir_me.bin tonga_rlc.bin vega12_gpu_info.bin
carrizo_me.bin hainan_k_smc.bin mullins_vce.bin navi14_smc.bin polaris10_ce.bin polaris12_k_smc.bin renoir_mec2.bin tonga_sdma1.bin vega12_me.bin
$ lsinitramfs /boot/initrd.img-5.11.0-27-generic | grep amdgpu
usr/lib/firmware/updates/amdgpu
usr/lib/firmware/updates/amdgpu/aldebaran_mec.bin
usr/lib/firmware/updates/amdgpu/aldebaran_mec2.bin
usr/lib/firmware/updates/amdgpu/aldebaran_rlc.bin
usr/lib/firmware/updates/amdgpu/aldebaran_sdma.bin
usr/lib/firmware/updates/amdgpu/aldebaran_smc.bin
usr/lib/firmware/updates/amdgpu/aldebaran_sos.bin
usr/lib/firmware/updates/amdgpu/aldebaran_ta.bin
usr/lib/firmware/updates/amdgpu/aldebaran_vcn.bin
usr/lib/firmware/updates/amdgpu/arcturus_asd.bin
usr/lib/firmware/updates/amdgpu/arcturus_gpu_info.bin
usr/lib/firmware/updates/amdgpu/arcturus_mec.bin
usr/lib/firmware/updates/amdgpu/arcturus_rlc.bin
usr/lib/firmware/updates/amdgpu/arcturus_sdma.bin
usr/lib/firmware/updates/amdgpu/arcturus_smc.bin
usr/lib/firmware/updates/amdgpu/arcturus_sos.bin
usr/lib/firmware/updates/amdgpu/arcturus_ta.bin
usr/lib/firmware/updates/amdgpu/arcturus_vcn.bin
usr/lib/firmware/updates/amdgpu/banks_k_2_smc.bin
usr/lib/firmware/updates/amdgpu/bonaire_ce.bin
usr/lib/firmware/updates/amdgpu/bonaire_k_smc.bin
usr/lib/firmware/updates/amdgpu/bonaire_mc.bin
usr/lib/firmware/updates/amdgpu/bonaire_me.bin
usr/lib/firmware/updates/amdgpu/bonaire_mec.bin
usr/lib/firmware/updates/amdgpu/bonaire_pfp.bin
usr/lib/firmware/updates/amdgpu/bonaire_rlc.bin
usr/lib/firmware/updates/amdgpu/bonaire_sdma.bin
usr/lib/firmware/updates/amdgpu/bonaire_sdma1.bin
usr/lib/firmware/updates/amdgpu/bonaire_smc.bin
usr/lib/firmware/updates/amdgpu/bonaire_uvd.bin
usr/lib/firmware/updates/amdgpu/bonaire_vce.bin
usr/lib/firmware/updates/amdgpu/carrizo_ce.bin
usr/lib/firmware/updates/amdgpu/carrizo_me.bin
usr/lib/firmware/updates/amdgpu/carrizo_mec.bin
usr/lib/firmware/updates/amdgpu/carrizo_mec2.bin
usr/lib/firmware/updates/amdgpu/carrizo_pfp.bin
usr/lib/firmware/updates/amdgpu/carrizo_rlc.bin
usr/lib/firmware/updates/amdgpu/carrizo_sdma.bin
usr/lib/firmware/updates/amdgpu/carrizo_sdma1.bin
usr/lib/firmware/updates/amdgpu/carrizo_uvd.bin
usr/lib/firmware/updates/amdgpu/carrizo_vce.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_ce.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_dmcub.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_me.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_mec.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_mec2.bin
usr/lib/firmware/updates/amdgpu/dimgrey_cavefish_pfp.bin
...
...
usr/lib/firmware/updates/amdgpu/verde_me.bin
usr/lib/firmware/updates/amdgpu/verde_pfp.bin
usr/lib/firmware/updates/amdgpu/verde_rlc.bin
usr/lib/firmware/updates/amdgpu/verde_smc.bin
usr/lib/firmware/updates/amdgpu/verde_uvd.bin
usr/lib/modules/5.11.0-27-generic/updates/dkms/amdgpu.ko
usr/lib/udev/rules.d/70-amdgpu.rules
Thanks for confirming that. So there's something going off with the PSP there, it could be the VBIOS. You should be able to get a newer one from your point-of-contact from where you got the GPU. If you want to try the other steps first (swapping another GPU in the same slot to make sure that the SBIOS/system is configured correctly, updating the kernel to the latest HWE kernel, updating your SBIOS and enabling "Above 4G decoding") then you can always try the VBIOS last, depending on how long it takes to get a new one. At least that way we can try to eliminate the remaining causes, since we have the required PSP FW installed in the ramfs image, and it's known to work in the 4.3.1 release. Good luck!
Do you know how to check if current SBIOS version is OK and which version to update?
The SBIOS will be the system BIOS, so that'll come from the motherboard manufacturer. You should be able to find that on their support page, or from the point of contact when you obtained the motherboard. For VBIOS (Video BIOS), we don't distribute those through the regular AMD website, so your point-of-contact for the GPU should be able to help there.