ROCK-Kernel-Driver
ROCK-Kernel-Driver copied to clipboard
Dual GPU system (Raven Ridge APU + R9 Nano Fiji)
I'm using a system with a Ryzen R5 2400G APU on a Gigabyte GA-AB350N-Gaming WIFI motherboard, plus a discrete R9 Nano fiji GPU. When I have disabled the integrated graphics through BIOS the discrete GPU works flawlessly on ROCm. However, when I have both GPUs enabled (monitor connected to the APU) I cannot see any GPU working via ROCm. In this case a relevant block of dmesg output is as follows:
...
[ 1.687371] kfd kfd: Initialized module
[ 1.688006] checking generic (c0000000 300000) vs hw (e0000000 10000000)
[ 1.688069] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[ 1.688865] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300 0x1002:0x0B36 0xCA).
[ 1.689679] [drm] register mmio base: 0xFE900000
[ 1.690470] [drm] register mmio size: 262144
[ 1.691258] [drm] add ip block number 0 <vi_common>
[ 1.692057] [drm] add ip block number 1 <gmc_v8_0>
[ 1.692787] [drm] add ip block number 2 <tonga_ih>
[ 1.693529] [drm] add ip block number 3 <powerplay>
[ 1.694281] [drm] add ip block number 4 <dm>
[ 1.694975] [drm] add ip block number 5 <gfx_v8_0>
[ 1.695630] [drm] add ip block number 6 <sdma_v3_0>
[ 1.696232] [drm] add ip block number 7 <uvd_v6_0>
[ 1.696720] [drm] add ip block number 8 <vce_v3_0>
[ 1.697226] [drm] UVD is enabled in physical mode
[ 1.697695] [drm] VCE enabled in physical mode
...
[ 1.917101] ATOM BIOS: 113-C8820200-107
[ 1.917768] [drm] GPU posting now...
...
[ 2.028095] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 2.029576] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 2.030393] amdgpu 0000:01:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 2.031141] [drm] Detected VRAM RAM=4096M, BAR=256M
[ 2.031887] [drm] RAM width 512bits HBM
[ 2.033182] [TTM] Zone kernel: Available graphics memory: 7700476 kiB
[ 2.033944] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 2.034690] [TTM] Initializing pool allocator
[ 2.035409] [TTM] Initializing DMA pool allocator
[ 2.036096] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.036706] [drm] amdgpu: 4096M of GTT memory ready.
[ 2.037291] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.037900] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 2.039900] [drm] Found UVD firmware Version: 1.87 Family ID: 12
[ 2.040477] [drm] UVD ENC is disabled
[ 2.041653] [drm] Found VCE firmware Version: 53.20 Binary ID: 3
[ 2.104067] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2.105363] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[ 2.106029] amdgpu: [powerplay] Error in phm_get_clock_info
[ 2.106056] ata1.00: ATA-8: WDC WD5000AVVS-63M8B0, 01.00A01, max UDMA/133
[ 2.106963] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[ 2.107595] ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 2.108854] ata1.00: configured for UDMA/133
[ 2.109224] [drm] Display Core initialized with v3.1.59!
[ 2.110138] scsi 0:0:0:0: Direct-Access ATA WDC WD5000AVVS-6 0A01 PQ: 0 ANSI: 5
[ 2.111128] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.112026] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2.112061] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[ 2.112071] sd 0:0:0:0: [sda] Write Protect is off
[ 2.112074] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.112089] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.112473] [drm] Driver supports precise vblank timestamp query.
[ 2.144240] [drm] UVD initialized successfully.
[ 2.233935] sda: sda1 sda2 sda3 sda4 sda5 sda6
[ 2.236079] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.245255] [drm] VCE initialized successfully.
[ 2.247630] kfd kfd: Allocated 3969056 bytes on gart
[ 2.248376] Topology: Add APU node [0x7300:0x1002]
[ 2.249120] kfd kfd: added device 1002:7300
[ 2.249924] [drm] Cannot find any crtc or sizes
[ 2.252814] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 0
[ 2.253425] checking generic (c0000000 300000) vs hw (c0000000 10000000)
[ 2.253426] fb: switching to amdgpudrmfb from EFI VGA
[ 2.253965] Console: switching to colour dummy device 80x25
[ 2.254411] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1458:0xD000 0xC6).
[ 2.254420] [drm] register mmio base: 0xFE400000
[ 2.254421] [drm] register mmio size: 524288
[ 2.254436] [drm] add ip block number 0 <soc15_common>
[ 2.254437] [drm] add ip block number 1 <gmc_v9_0>
[ 2.254439] [drm] add ip block number 2 <vega10_ih>
[ 2.254440] [drm] add ip block number 3 <psp>
[ 2.254442] [drm] add ip block number 4 <powerplay>
[ 2.254443] [drm] add ip block number 5 <dm>
[ 2.254445] [drm] add ip block number 6 <gfx_v9_0>
[ 2.254446] [drm] add ip block number 7 <sdma_v4_0>
[ 2.254448] [drm] add ip block number 8 <vcn_v1_0>
[ 2.254478] [drm] VCN decode is enabled in VM mode
[ 2.254479] [drm] VCN encode is enabled in VM mode
[ 2.254480] [drm] VCN jpeg decode is enabled in VM mode
[ 2.254482] vga_switcheroo: enabled
[ 2.274957] [drm] BIOS signature incorrect 0 0
[ 2.274976] ATOM BIOS: 113-RAVEN-107
[ 2.275007] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 2.275024] amdgpu 0000:0a:00.0: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
[ 2.275026] amdgpu 0000:0a:00.0: GART: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
[ 2.275030] [drm] Detected VRAM RAM=1024M, BAR=1024M
[ 2.275032] [drm] RAM width 128bits DDR4
[ 2.275039] [drm] amdgpu: 1024M of VRAM memory ready
[ 2.275041] [drm] amdgpu: 3072M of GTT memory ready.
[ 2.275047] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.275202] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 2.276707] [drm] use_doorbell being set to: [true]
[ 2.276789] [drm] Found VCN firmware Version: 1.73 Family ID: 18
[ 2.276792] [drm] PSP loading VCN firmware
[ 2.426836] ata2: SATA link down (SStatus 0 SControl 300)
[ 2.440246] amdgpu: [powerplay] dpm has been enabled
[ 2.440312] [drm] DM_PPLIB: values for Invalid clock
[ 2.440314] [drm] DM_PPLIB: 0 in kHz
[ 2.440315] [drm] DM_PPLIB: 0 in kHz
[ 2.440317] [drm] DM_PPLIB: 0 in kHz
[ 2.440318] [drm] DM_PPLIB: 1600000 in kHz
[ 2.440393] WARNING: CPU: 0 PID: 204 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1372 dcn_bw_update_from_pplib+0x196/0x2c0 [amdgpu]
[ 2.440396] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched hid_generic i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops usbhid hid drm r8169 i2c_piix4 ahci libahci wmi gpio_amdpt gpio_generic video
[ 2.440412] CPU: 0 PID: 204 Comm: systemd-udevd Not tainted 4.19.0-041900-generic #201810221809
[ 2.440415] Hardware name: Gigabyte Technology Co., Ltd. AB350N-Gaming WIFI/AB350N-Gaming WIFI-CF, BIOS F23 08/08/2018
[ 2.440474] RIP: 0010:dcn_bw_update_from_pplib+0x196/0x2c0 [amdgpu]
[ 2.440477] Code: 84 fd 44 ff ff ff 49 8b 95 78 01 00 00 48 89 85 30 ff ff ff df ad 30 ff ff ff d8 f1 db 42 78 de c9 de ca de f9 d9 5a 4c eb 02 <0f> 0b 48 89 da be 04 00 00 00 4c 89 e7 e8 58 4a fe ff 84 c0 0f 84
[ 2.440481] RSP: 0018:ffffbd1f42547680 EFLAGS: 00010246
[ 2.440483] RAX: 0000000000000001 RBX: ffffbd1f425476e0 RCX: 0000000000000000
[ 2.440485] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000246
[ 2.440487] RBP: ffffbd1f42547750 R08: 0000000000000000 R09: 00000000000003e2
[ 2.440489] R10: ffff9dc49f6e0f00 R11: 0720072007200720 R12: ffff9dc4accb2b00
[ 2.440492] R13: ffff9dc49f228000 R14: ffff9dc4a1862d40 R15: ffff9dc49f720000
[ 2.440494] FS: 00007f89d52e9680(0000) GS:ffff9dc4afa00000(0000) knlGS:0000000000000000
[ 2.440497] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.440499] CR2: 00007ffc0f7b4ce8 CR3: 000000042106c000 CR4: 00000000003406f0
[ 2.440501] Call Trace:
[ 2.440561] construct+0x841/0xa00 [amdgpu]
[ 2.440565] ? _cond_resched+0x19/0x30
[ 2.440622] dcn10_create_resource_pool+0x41/0x70 [amdgpu]
[ 2.440677] dc_create_resource_pool+0x46/0x180 [amdgpu]
[ 2.440680] ? _cond_resched+0x19/0x30
[ 2.440683] ? __kmalloc+0x1ee/0x220
[ 2.440739] ? dal_gpio_service_create+0xa1/0x130 [amdgpu]
[ 2.440793] dc_create+0x20f/0x630 [amdgpu]
[ 2.440796] ? kmem_cache_alloc_trace+0x172/0x1e0
[ 2.440852] dm_hw_init+0xc6/0x130 [amdgpu]
[ 2.440905] amdgpu_device_init.cold.28+0x113a/0x12e9 [amdgpu]
[ 2.440950] amdgpu_driver_load_kms+0x8b/0x2d0 [amdgpu]
[ 2.440962] drm_dev_register+0x11f/0x160 [drm]
[ 2.441006] amdgpu_pci_probe+0x140/0x1c0 [amdgpu]
[ 2.441009] local_pci_probe+0x46/0x90
[ 2.441012] pci_device_probe+0x18d/0x1a0
[ 2.441016] really_probe+0x243/0x3b0
[ 2.441018] driver_probe_device+0xba/0x100
[ 2.441021] __driver_attach+0xe4/0x110
[ 2.441024] ? driver_probe_device+0x100/0x100
[ 2.441026] bus_for_each_dev+0x74/0xb0
[ 2.441029] ? kmem_cache_alloc_trace+0x1c8/0x1e0
[ 2.441032] driver_attach+0x1e/0x20
[ 2.441034] bus_add_driver+0x159/0x230
[ 2.441036] ? 0xffffffffc0631000
[ 2.441039] driver_register+0x70/0xc0
[ 2.441041] ? 0xffffffffc0631000
[ 2.441044] __pci_register_driver+0x57/0x60
[ 2.441084] amdgpu_init+0x87/0x89 [amdgpu]
[ 2.441087] do_one_initcall+0x4a/0x1c4
[ 2.441090] ? _cond_resched+0x19/0x30
[ 2.441093] ? kmem_cache_alloc_trace+0x172/0x1e0
[ 2.441095] ? kfree+0x15b/0x180
[ 2.441098] do_init_module+0x60/0x220
[ 2.441101] load_module+0x16c1/0x1930
[ 2.441104] __do_sys_finit_module+0xbd/0x120
[ 2.441107] ? __do_sys_finit_module+0xbd/0x120
[ 2.441110] __x64_sys_finit_module+0x1a/0x20
[ 2.441112] do_syscall_64+0x5a/0x110
[ 2.441115] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2.441117] RIP: 0033:0x7f89d4df3839
[ 2.441119] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[ 2.441123] RSP: 002b:00007ffc0f7bc248 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.441126] RAX: ffffffffffffffda RBX: 000055e725b6ae70 RCX: 00007f89d4df3839
[ 2.441128] RDX: 0000000000000000 RSI: 00007f89d4ad2145 RDI: 0000000000000015
[ 2.441130] RBP: 00007f89d4ad2145 R08: 0000000000000000 R09: 00007ffc0f7bc360
[ 2.441132] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
[ 2.441134] R13: 000055e725b7a9b0 R14: 0000000000020000 R15: 000055e725b6ae70
[ 2.441137] ---[ end trace a18cfa66fa2286d4 ]---
[ 2.441139] [drm] DM_PPLIB: values for Invalid clock
[ 2.441141] [drm] DM_PPLIB: 300000 in kHz
[ 2.441143] [drm] DM_PPLIB: 600000 in kHz
[ 2.441144] [drm] DM_PPLIB: 626000 in kHz
[ 2.441145] [drm] DM_PPLIB: 654000 in kHz
[ 2.441590] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:1! type 0 expected 3
[ 2.441645] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3
[ 2.457975] [drm] Display Core initialized with v3.1.59!
[ 2.483331] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.483334] [drm] Driver supports precise vblank timestamp query.
[ 2.506034] [drm] VCN decode and encode initialized successfully.
[ 2.507327] kfd kfd: Allocated 3969056 bytes on gart
[ 2.507400] Virtual CRAT table created for GPU
[ 2.507402] Parsing CRAT table with 1 nodes
[ 2.507413] Creating topology SYSFS entries
[ 2.507477] Topology: Add dGPU node [0x15dd:0x1002]
[ 2.507668] kfd kfd: added device 1002:15dd
[ 2.509663] [drm] fb mappable at 0x81100000
[ 2.509666] [drm] vram apper at 0x80000000
[ 2.509667] [drm] size 14745600
[ 2.509669] [drm] fb depth is 24
[ 2.509670] [drm] pitch is 10240
[ 2.509765] fbcon: amdgpudrmfb (fb0) is primary device
[ 2.546405] Console: switching to colour frame buffer device 320x90
[ 2.570456] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[ 2.584109] amdgpu 0000:0a:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[ 2.584154] amdgpu 0000:0a:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[ 2.584200] amdgpu 0000:0a:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[ 2.584246] amdgpu 0000:0a:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[ 2.584291] amdgpu 0000:0a:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[ 2.584337] amdgpu 0000:0a:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[ 2.584383] amdgpu 0000:0a:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[ 2.584429] amdgpu 0000:0a:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[ 2.584475] amdgpu 0000:0a:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[ 2.584521] amdgpu 0000:0a:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[ 2.584567] amdgpu 0000:0a:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[ 2.584611] amdgpu 0000:0a:00.0: ring 11(vcn_dec) uses VM inv eng 5 on hub 1
[ 2.584656] amdgpu 0000:0a:00.0: ring 12(vcn_enc0) uses VM inv eng 6 on hub 1
[ 2.584701] amdgpu 0000:0a:00.0: ring 13(vcn_enc1) uses VM inv eng 7 on hub 1
[ 2.584746] amdgpu 0000:0a:00.0: ring 14(vcn_jpeg) uses VM inv eng 8 on hub 1
[ 2.589390] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0a:00.0 on minor 1
...
The output of rocminfo follows:
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 0.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*** Done ***
Any ideas would be appreciated. I'm interested using the integrated GPU at least for OpenCL, which seems reportedly work.
Thanks.
Your kernel log looks kinda scary, but I've seen such DCN errors before. They don't seem to cause any ROCm-specific problems.
Topology: Add dGPU node [0x15dd:0x1002] kfd kfd: added device 1002:15dd
This one is funny, because 15dd is the Raven integrated GPU, which shouldn't be a dGPU. Anyway, it does get added to the topology, so it doesn't explain why rocminfo isn't listing any HSA agents. What does "cat /sys/class/kfd/kfd/topology/nodes/*/properties" say?
That said, currently mixed APU+dGPU configurations are not supported in ROCm. Even if you can get past this problem, chances are that this mix of Fiji+Raven will not work with ROCm. dGPUs and APUs use very different memory management models, and integrating them into a coherent shared virtual address space in the user mode driver is not trivial. It's something we haven't tried or tested.
Here is the properties output:
cpu_cores_count 8
simd_count 44
mem_banks_count 1
caches_count 13
io_links_count 1
cpu_core_id_base 0
simd_id_base 0
max_waves_per_simd 40
lds_size_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 1
simd_arrays_per_engine 1
cu_per_simd_array 11
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 4098
device_id 29440
location_id 256
drm_render_minor 128
max_engine_clk_fcompute 1000
local_mem_size 0
fw_version 665
capability 4736
max_engine_clk_ccompute 3600
cpu_cores_count 0
simd_count 44
mem_banks_count 1
caches_count 17
io_links_count 1
cpu_core_id_base 0
simd_id_base 2147487744
max_waves_per_simd 10
lds_size_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 1
simd_arrays_per_engine 1
cu_per_simd_array 11
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 4098
device_id 5597
location_id 2560
drm_render_minor 129
max_engine_clk_fcompute 1250
local_mem_size 0
fw_version 363
capability 8834
max_engine_clk_ccompute 3600
Let me clarify that I'm using Ubuntu 18.04.1 with kernel 4.19. Using stock ubuntu kernel (4.15) does not improve the situation. I also tried using the amdgpu-pro driver which sees both GPUs without problems.
Is the shared virtual address space mandatory? I'd be happy even if I could use them without having a unified shared address space.
Looking at the node properties, it finds both GPUs but associates the wrong GPU with the APU core. There is a built-in assumption that the first GPU initialized by the amdgpu driver is the integrated GPU. But in your case the dGPU gets initialized first and gets associated with the APU incorrectly. Then the iGPU is initialized and treated as dGPU. As a result both GPUs aren't working in ROCm.
If you're using 4.19, are you using DKMS or the built-in KFD in your kernel? I think DKMS won't work with 4.19 kernels. What's the output of "sudo dkms status"?
If you're comfortable with testing bleeding-edge kernels, I could try sending some patches to get KFD to recognize your GPUs properly.
Thank you for the info.
I've tried both with and without DKMS. Currently, I have removed both rocm-dkms rock-dkms packages. Executing "sudo dkms status" gives no output. Problem seems to be the same:
elias@Uranus:~$ sudo dkms status
elias@Uranus:~$ rocminfo
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 0.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*** Done ***
elias@Uranus:~$ dmesg|grep kfd
[ 1.705487] kfd kfd: Initialized module
[ 2.259630] kfd kfd: Allocated 3969056 bytes on gart
[ 2.261120] kfd kfd: added device 1002:7300
[ 2.457506] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched i2c_algo_bit hid_generic ttm drm_kms_helper syscopyarea i2c_piix4 sysfillrect sysimgblt usbhid fb_sys_fops hid drm r8169 ahci libahci wmi video gpio_amdpt gpio_generic
[ 2.544372] kfd kfd: Allocated 3969056 bytes on gart
[ 2.544661] kfd kfd: added device 1002:15dd
elias@Uranus:~$ cat /sys/class/kfd/kfd/topology/nodes/*/properties
cpu_cores_count 8
simd_count 44
mem_banks_count 1
caches_count 13
io_links_count 1
cpu_core_id_base 0
simd_id_base 0
max_waves_per_simd 40
lds_size_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 1
simd_arrays_per_engine 1
cu_per_simd_array 11
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 4098
device_id 29440
location_id 256
drm_render_minor 128
max_engine_clk_fcompute 1000
local_mem_size 0
fw_version 665
capability 4736
max_engine_clk_ccompute 3600
cpu_cores_count 0
simd_count 44
mem_banks_count 1
caches_count 17
io_links_count 1
cpu_core_id_base 0
simd_id_base 2147487744
max_waves_per_simd 10
lds_size_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 1
simd_arrays_per_engine 1
cu_per_simd_array 11
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 4098
device_id 5597
location_id 2560
drm_render_minor 129
max_engine_clk_fcompute 1250
local_mem_size 0
fw_version 363
capability 8834
max_engine_clk_ccompute 3600
I'd be happy to try any kfd patches that could help solving this issue.
The empty DKMS output means you're using KFD from your 4.19 kernel.
The attached patch is against current amd-staging-drm-next, but it should apply to a vanilla kernel. That code hasn't changed much recently. I named it .patch.txt to get around github limitations for supported file types.
0001-drm-amdkfd-Assign-only-iGPUs-to-pre-existing-topolog.patch.txt
After being busy with kernel compilation issues I successfully patched and built the kernel. While testing I got the following outputs.
For dmesg:
...
[ 1.844068] kfd kfd: Initialized module
[ 1.844741] checking generic (c0000000 300000) vs hw (e0000000 10000000)
[ 1.844769] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[ 1.845613] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300 0x1002:0x0B36 0xCA).
[ 1.846327] [drm] register mmio base: 0xFE900000
[ 1.847014] [drm] register mmio size: 262144
[ 1.847705] [drm] add ip block number 0 <vi_common>
[ 1.848381] [drm] add ip block number 1 <gmc_v8_0>
[ 1.848960] [drm] add ip block number 2 <tonga_ih>
[ 1.849490] [drm] add ip block number 3 <powerplay>
[ 1.850023] [drm] add ip block number 4 <dm>
[ 1.850552] [drm] add ip block number 5 <gfx_v8_0>
[ 1.851084] [drm] add ip block number 6 <sdma_v3_0>
[ 1.851622] [drm] add ip block number 7 <uvd_v6_0>
[ 1.852146] [drm] add ip block number 8 <vce_v3_0>
...
[ 2.075596] [drm] GPU posting now...
[ 2.144320] ata9: SATA link down (SStatus 0 SControl 300)
[ 2.145951] ata10: SATA link down (SStatus 0 SControl 300)
[ 2.184077] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 2.184890] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 2.185695] amdgpu 0000:01:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 2.186506] [drm] Detected VRAM RAM=4096M, BAR=256M
[ 2.187286] [drm] RAM width 512bits HBM
[ 2.188192] [TTM] Zone kernel: Available graphics memory: 7183946 kiB
[ 2.188972] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 2.189737] [TTM] Initializing pool allocator
[ 2.190464] [TTM] Initializing DMA pool allocator
[ 2.191180] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.191802] [drm] amdgpu: 4096M of GTT memory ready.
[ 2.192379] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.193012] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 2.194817] [drm] Found UVD firmware Version: 1.87 Family ID: 12
[ 2.195452] [drm] UVD ENC is disabled
[ 2.196608] [drm] Found VCE firmware Version: 53.20 Binary ID: 3
[ 2.259265] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[ 2.259878] amdgpu: [powerplay] Error in phm_get_clock_info
[ 2.260760] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[ 2.261857] [drm] Display Core initialized with v3.1.59!
[ 2.263498] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.264157] [drm] Driver supports precise vblank timestamp query.
[ 2.284284] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2.286749] ata1.00: ATA-8: WDC WD5000AVVS-63M8B0, 01.00A01, max UDMA/133
[ 2.287592] ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 2.289254] ata1.00: configured for UDMA/133
[ 2.290338] scsi 0:0:0:0: Direct-Access ATA WDC WD5000AVVS-6 0A01 PQ: 0 ANSI: 5
[ 2.291543] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2.291554] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[ 2.293154] [drm] UVD initialized successfully.
[ 2.293596] sd 0:0:0:0: [sda] Write Protect is off
[ 2.295238] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.295269] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.394170] [drm] VCE initialized successfully.
[ 2.396241] kfd kfd: Allocated 3969056 bytes on gart
[ 2.397068] Virtual CRAT table created for GPU
[ 2.397843] Parsing CRAT table with 1 nodes
[ 2.398634] Creating topology SYSFS entries
[ 2.399593] WARNING: CPU: 3 PID: 220 at drivers/gpu/drm/amd/amdkfd/kfd_topology.c:1206 kfd_topology_add_device+0x47e/0x4e0 [amdkfd]
[ 2.401055] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) hid_generic chash gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea usbhid sysfillrect sysimgblt r8169 hid fb_sys_fops drm i2c_piix4 ahci wmi libahci video gpio_amdpt gpio_generic
[ 2.402412] CPU: 3 PID: 220 Comm: systemd-udevd Not tainted 4.19.13+ #1
[ 2.403034] Hardware name: Gigabyte Technology Co., Ltd. AB350N-Gaming WIFI/AB350N-Gaming WIFI-CF, BIOS F24 12/25/2018
[ 2.403659] RIP: 0010:kfd_topology_add_device+0x47e/0x4e0 [amdkfd]
[ 2.404269] Code: 89 85 2c ff ff ff e8 21 7d 5e d1 85 db 75 59 83 05 ca cf 02 00 01 4c 89 ef e8 4e e1 ff ff 48 85 c0 49 89 c7 0f 85 ab fc ff ff <0f> 0b c7 85 2c ff ff ff ed ff ff ff e9 ed fd ff ff e8 8c 04 59 d1
[ 2.405532] RSP: 0018:ffffa2a381fb77e8 EFLAGS: 00010246
[ 2.406171] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2.406808] RDX: ffffffff00000001 RSI: ffff962e6187b088 RDI: ffff962e634e0800
[ 2.407444] RBP: ffffa2a381fb78c0 R08: 0000000000000044 R09: 0000000000000228
[ 2.408079] R10: ffff962e61c68700 R11: 000000000000001c R12: 00000000000061b5
[ 2.408703] R13: ffff962e634e0800 R14: 0000000000000000 R15: 0000000000000000
[ 2.409328] FS: 00007f842e91b680(0000) GS:ffff962e70cc0000(0000) knlGS:0000000000000000
[ 2.409956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.410586] CR2: 00007f842e8f9271 CR3: 00000004224ee000 CR4: 00000000003406e0
[ 2.411220] Call Trace:
[ 2.411860] kgd2kfd_device_init+0x278/0x400 [amdkfd]
[ 2.412538] amdgpu_amdkfd_device_init+0x18b/0x1c0 [amdgpu]
[ 2.413209] amdgpu_device_init+0xcb4/0x1560 [amdgpu]
[ 2.413861] ? kmalloc_order+0x18/0x40
[ 2.414507] ? kmalloc_order_trace+0x24/0xb0
[ 2.414571] sda: sda1 sda2 sda3 sda4 sda5 sda6
[ 2.415162] amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[ 2.416631] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.416732] drm_dev_register+0x128/0x1b0 [drm]
[ 2.418307] amdgpu_pci_probe+0x148/0x200 [amdgpu]
[ 2.418923] local_pci_probe+0x47/0xa0
[ 2.419527] pci_device_probe+0x145/0x1b0
[ 2.420128] really_probe+0x268/0x3d0
[ 2.420728] driver_probe_device+0x11a/0x130
[ 2.421319] __driver_attach+0xe3/0x110
[ 2.421904] ? driver_probe_device+0x130/0x130
[ 2.422488] ? driver_probe_device+0x130/0x130
[ 2.423060] bus_for_each_dev+0x74/0xb0
[ 2.423621] ? kmem_cache_alloc_trace+0x1b1/0x1d0
[ 2.424179] driver_attach+0x1e/0x20
[ 2.424737] bus_add_driver+0x167/0x260
[ 2.425281] ? 0xffffffffc0550000
[ 2.425826] driver_register+0x60/0x100
[ 2.426362] ? 0xffffffffc0550000
[ 2.426887] __pci_register_driver+0x5a/0x60
[ 2.427439] amdgpu_init+0x7a/0x89 [amdgpu]
[ 2.427968] do_one_initcall+0x4a/0x1c9
[ 2.428496] ? __vunmap+0x8e/0xc0
[ 2.429017] ? _cond_resched+0x19/0x40
[ 2.429527] ? kmem_cache_alloc_trace+0x42/0x1d0
[ 2.430022] ? vfree+0x35/0x70
[ 2.430495] do_init_module+0x5f/0x216
[ 2.430961] load_module+0x21b6/0x2aa0
[ 2.431403] __do_sys_finit_module+0xfc/0x120
[ 2.431830] ? __do_sys_finit_module+0xfc/0x120
[ 2.432251] __x64_sys_finit_module+0x1a/0x20
[ 2.432657] do_syscall_64+0x5a/0x120
[ 2.433047] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2.433433] RIP: 0033:0x7f842e425839
[ 2.433814] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[ 2.434619] RSP: 002b:00007fff06cfa0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.435039] RAX: ffffffffffffffda RBX: 000055e5a36f3d20 RCX: 00007f842e425839
[ 2.435467] RDX: 0000000000000000 RSI: 00007f842e104145 RDI: 0000000000000015
[ 2.435889] RBP: 00007f842e104145 R08: 0000000000000000 R09: 00007fff06cfa1d0
[ 2.436306] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
[ 2.436720] R13: 000055e5a36fef90 R14: 0000000000020000 R15: 000055e5a36f3d20
[ 2.437122] ---[ end trace f0c9c746a21ea4cf ]---
[ 2.437541] kfd kfd: Error adding device to topology
[ 2.437986] kfd kfd: device 1002:7300 NOT added due to errors
[ 2.438517] [drm] Cannot find any crtc or sizes
[ 2.441537] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 0
[ 2.441993] checking generic (c0000000 300000) vs hw (c0000000 10000000)
[ 2.441993] fb: switching to amdgpudrmfb from EFI VGA
[ 2.442426] Console: switching to colour dummy device 80x25
[ 2.442771] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1458:0xD000 0xC6).
[ 2.442780] [drm] register mmio base: 0xFE400000
[ 2.442781] [drm] register mmio size: 524288
[ 2.442793] [drm] add ip block number 0 <soc15_common>
[ 2.442795] [drm] add ip block number 1 <gmc_v9_0>
[ 2.442796] [drm] add ip block number 2 <vega10_ih>
[ 2.442797] [drm] add ip block number 3 <psp>
[ 2.442799] [drm] add ip block number 4 <powerplay>
[ 2.442800] [drm] add ip block number 5 <dm>
[ 2.442801] [drm] add ip block number 6 <gfx_v9_0>
[ 2.442803] [drm] add ip block number 7 <sdma_v4_0>
[ 2.442804] [drm] add ip block number 8 <vcn_v1_0>
[ 2.442825] [drm] VCN decode is enabled in VM mode
[ 2.442826] [drm] VCN encode is enabled in VM mode
[ 2.442827] [drm] VCN jpeg decode is enabled in VM mode
[ 2.442829] vga_switcheroo: enabled
[ 2.462911] [drm] BIOS signature incorrect 0 0
[ 2.462928] ATOM BIOS: 113-RAVEN-111
[ 2.462950] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 2.462969] amdgpu 0000:0a:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 2.462971] amdgpu 0000:0a:00.0: GART: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
[ 2.462975] [drm] Detected VRAM RAM=2048M, BAR=2048M
[ 2.462976] [drm] RAM width 128bits DDR4
[ 2.462984] [drm] amdgpu: 2048M of VRAM memory ready
[ 2.462986] [drm] amdgpu: 3072M of GTT memory ready.
[ 2.462991] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.463145] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 2.464297] [drm] use_doorbell being set to: [true]
[ 2.464391] [drm] Found VCN firmware Version: 1.73 Family ID: 18
[ 2.464394] [drm] PSP loading VCN firmware
[ 2.606844] ata2: SATA link down (SStatus 0 SControl 300)
[ 2.628801] amdgpu: [powerplay] dpm has been enabled
[ 2.628859] [drm] DM_PPLIB: values for Invalid clock
[ 2.628861] [drm] DM_PPLIB: 0 in kHz
[ 2.628862] [drm] DM_PPLIB: 0 in kHz
[ 2.628864] [drm] DM_PPLIB: 0 in kHz
[ 2.628865] [drm] DM_PPLIB: 1600000 in kHz
[ 2.628928] WARNING: CPU: 3 PID: 220 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1372 dcn_bw_update_from_pplib+0x19a/0x2b0 [amdgpu]
[ 2.628932] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) hid_generic chash gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea usbhid sysfillrect sysimgblt r8169 hid fb_sys_fops drm i2c_piix4 ahci wmi libahci video gpio_amdpt gpio_generic
[ 2.628947] CPU: 3 PID: 220 Comm: systemd-udevd Tainted: G W 4.19.13+ #1
[ 2.628949] Hardware name: Gigabyte Technology Co., Ltd. AB350N-Gaming WIFI/AB350N-Gaming WIFI-CF, BIOS F24 12/25/2018
[ 2.628995] RIP: 0010:dcn_bw_update_from_pplib+0x19a/0x2b0 [amdgpu]
[ 2.628998] Code: 84 fd 44 ff ff ff 49 8b 95 78 01 00 00 48 89 85 30 ff ff ff df ad 30 ff ff ff d8 f1 db 42 78 de c9 de ca de f9 d9 5a 4c eb 02 <0f> 0b 48 89 da be 04 00 00 00 4c 89 e7 e8 c4 47 fe ff 84 c0 74 32
[ 2.629002] RSP: 0018:ffffa2a381fb76a0 EFLAGS: 00010246
[ 2.629004] RAX: 0000000000000001 RBX: ffffa2a381fb7700 RCX: 0000000000000004
[ 2.629006] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246
[ 2.629008] RBP: ffffa2a381fb7770 R08: 0000000000000425 R09: 0720072007200720
[ 2.629010] R10: 0000000000000000 R11: 0720072007200720 R12: ffff962e60884800
[ 2.629012] R13: ffff962e6052a000 R14: 0000000000000000 R15: ffff962e6052a000
[ 2.629015] FS: 00007f842e91b680(0000) GS:ffff962e70cc0000(0000) knlGS:0000000000000000
[ 2.629018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.629019] CR2: 00007f842e8f9271 CR3: 00000004224ee000 CR4: 00000000003406e0
[ 2.629021] Call Trace:
[ 2.629070] dcn10_create_resource_pool+0x7d4/0x9e0 [amdgpu]
[ 2.629114] dc_create_resource_pool+0x46/0x180 [amdgpu]
[ 2.629118] ? _cond_resched+0x19/0x40
[ 2.629121] ? __kmalloc+0x1d9/0x220
[ 2.629163] ? dal_gpio_service_create+0xa1/0x120 [amdgpu]
[ 2.629205] dc_create+0x221/0x620 [amdgpu]
[ 2.629208] ? kmem_cache_alloc_trace+0x42/0x1d0
[ 2.629250] dm_hw_init+0xc3/0x250 [amdgpu]
[ 2.629282] amdgpu_device_init+0xc92/0x1560 [amdgpu]
[ 2.629312] amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[ 2.629322] drm_dev_register+0x128/0x1b0 [drm]
[ 2.629352] amdgpu_pci_probe+0x148/0x200 [amdgpu]
[ 2.629356] local_pci_probe+0x47/0xa0
[ 2.629359] pci_device_probe+0x145/0x1b0
[ 2.629362] really_probe+0x268/0x3d0
[ 2.629365] driver_probe_device+0x11a/0x130
[ 2.629368] __driver_attach+0xe3/0x110
[ 2.629370] ? driver_probe_device+0x130/0x130
[ 2.629373] ? driver_probe_device+0x130/0x130
[ 2.629375] bus_for_each_dev+0x74/0xb0
[ 2.629377] ? kmem_cache_alloc_trace+0x1b1/0x1d0
[ 2.629380] driver_attach+0x1e/0x20
[ 2.629383] bus_add_driver+0x167/0x260
[ 2.629385] ? 0xffffffffc0550000
[ 2.629387] driver_register+0x60/0x100
[ 2.629389] ? 0xffffffffc0550000
[ 2.629391] __pci_register_driver+0x5a/0x60
[ 2.629429] amdgpu_init+0x7a/0x89 [amdgpu]
[ 2.629432] do_one_initcall+0x4a/0x1c9
[ 2.629435] ? __vunmap+0x8e/0xc0
[ 2.629437] ? _cond_resched+0x19/0x40
[ 2.629439] ? kmem_cache_alloc_trace+0x42/0x1d0
[ 2.629441] ? vfree+0x35/0x70
[ 2.629445] do_init_module+0x5f/0x216
[ 2.629447] load_module+0x21b6/0x2aa0
[ 2.629451] __do_sys_finit_module+0xfc/0x120
[ 2.629453] ? __do_sys_finit_module+0xfc/0x120
[ 2.629457] __x64_sys_finit_module+0x1a/0x20
[ 2.629459] do_syscall_64+0x5a/0x120
[ 2.629462] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2.629464] RIP: 0033:0x7f842e425839
[ 2.629466] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[ 2.629470] RSP: 002b:00007fff06cfa0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.629473] RAX: ffffffffffffffda RBX: 000055e5a36f3d20 RCX: 00007f842e425839
[ 2.629475] RDX: 0000000000000000 RSI: 00007f842e104145 RDI: 0000000000000015
[ 2.629477] RBP: 00007f842e104145 R08: 0000000000000000 R09: 00007fff06cfa1d0
[ 2.629479] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
[ 2.629481] R13: 000055e5a36fef90 R14: 0000000000020000 R15: 000055e5a36f3d20
[ 2.629484] ---[ end trace f0c9c746a21ea4d0 ]---
[ 2.629486] [drm] DM_PPLIB: values for Invalid clock
[ 2.629488] [drm] DM_PPLIB: 300000 in kHz
[ 2.629489] [drm] DM_PPLIB: 600000 in kHz
[ 2.629491] [drm] DM_PPLIB: 626000 in kHz
[ 2.629492] [drm] DM_PPLIB: 654000 in kHz
[ 2.630306] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:1! type 0 expected 3
[ 2.630347] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3
[ 2.675322] [drm] Display Core initialized with v3.1.59!
[ 2.700732] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.700735] [drm] Driver supports precise vblank timestamp query.
[ 2.723504] [drm] VCN decode and encode initialized successfully.
[ 2.723650] kfd kfd: Allocated 3969056 bytes on gart
[ 2.723676] Topology: Add dGPU node [0x0:0x0]
[ 2.723970] kfd kfd: added device 1002:15dd
[ 2.725667] [drm] fb mappable at 0x41100000
[ 2.725669] [drm] vram apper at 0x40000000
[ 2.725671] [drm] size 14745600
[ 2.725672] [drm] fb depth is 24
[ 2.725673] [drm] pitch is 10240
[ 2.725738] fbcon: amdgpudrmfb (fb0) is primary device
[ 2.763657] Console: switching to colour frame buffer device 320x90
[ 2.787097] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[ 2.800088] amdgpu 0000:0a:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[ 2.800110] amdgpu 0000:0a:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[ 2.800133] amdgpu 0000:0a:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[ 2.800155] amdgpu 0000:0a:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[ 2.800178] amdgpu 0000:0a:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[ 2.800200] amdgpu 0000:0a:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[ 2.800223] amdgpu 0000:0a:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[ 2.800245] amdgpu 0000:0a:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[ 2.800268] amdgpu 0000:0a:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[ 2.800290] amdgpu 0000:0a:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[ 2.800313] amdgpu 0000:0a:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[ 2.800334] amdgpu 0000:0a:00.0: ring 11(vcn_dec) uses VM inv eng 5 on hub 1
[ 2.800356] amdgpu 0000:0a:00.0: ring 12(vcn_enc0) uses VM inv eng 6 on hub 1
[ 2.800378] amdgpu 0000:0a:00.0: ring 13(vcn_enc1) uses VM inv eng 7 on hub 1
[ 2.800400] amdgpu 0000:0a:00.0: ring 14(vcn_jpeg) uses VM inv eng 8 on hub 1
[ 2.804121] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0a:00.0 on minor 1
and for the topologies:
cpu_cores_count 8
simd_count 44
mem_banks_count 1
caches_count 13
io_links_count 1
cpu_core_id_base 0
simd_id_base 0
max_waves_per_simd 40
lds_size_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 1
simd_arrays_per_engine 1
cu_per_simd_array 11
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 4098
device_id 5597
location_id 2560
drm_render_minor 129
max_engine_clk_fcompute 1250
local_mem_size 0
fw_version 363
capability 8834
max_engine_clk_ccompute 3600
cpu_cores_count 0
simd_count 256
mem_banks_count 1
caches_count 96
io_links_count 1
cpu_core_id_base 0
simd_id_base 2147487744
max_waves_per_simd 10
lds_Wsize_in_kb 64
gds_size_in_kb 0
wave_front_size 64
array_count 4
simd_arrays_per_engine 0
cu_per_simd_array 16
simd_per_cu 4
max_slots_scratch_cu 32
vendor_id 0
device_id 0
location_id 0
drm_render_minor 0
max_engine_clk_ccompute 3600
Was Fiji not successfully initiated?
I messed up the patch. I'll need to send you another revision.
OK, no problem. I'll be anticipating for an updated patch.
Please try the attached patch instead of the last one. Sorry about that.
0001-drm-amdkfd-Don-t-assign-dGPUs-to-APU-topology-device.patch.txt
Wow! I tried the new patch and I can see light! Rocminfo gives output of both GPU devices:
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 2400G with Radeon Vega Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32KB
Chip ID: 5597
Cacheline Size: 64
Max Clock Frequency (MHz):3600
BDFID: 2560
Compute Unit: 8
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16776832KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A
*******
Agent 2
*******
Name: gfx902
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 0
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 5597
Cacheline Size: 64
Max Clock Frequency (MHz):1250
BDFID: 2560
Compute Unit: 11
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 167773184
Dim[2]: 0
Grid Max Size: 4294967295
Waves Per CU: 160
Max Work-item Per CU: 10240
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx902+xnack
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx803
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 29440
Cacheline Size: 64
Max Clock Frequency (MHz):1000
BDFID: 256
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 16778240
Dim[2]: 0
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***
However, clinfo does not recognize any device:
ERROR: clGetPlatformIDs(-1001)
I have installed both rocm-opencl & rocm-opencl-dev packages.
Whenever I run clinfo the following two lines are appended to dmesg:
[ 497.669135] Alloc host visible vram on small bar is not allowed
[ 498.036467] amdgpu: [powerplay] pp_dpm_switch_power_profile was not implemented.
And here is an updated block of the dmesg output:
...
[ 1.829691] [drm] amdgpu kernel modesetting enabled.
[ 1.830262] vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
[ 1.830920] ATPX version 1, functions 0x00000000
[ 1.833331] AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
[ 1.837847] Parsing CRAT table with 1 nodes
[ 1.838536] Creating topology SYSFS entries
[ 1.839145] Topology: Add APU node [0x0:0x0]
[ 1.839678] Finished initializing topology
[ 1.840227] kfd kfd: Initialized module
[ 1.840813] checking generic (c0000000 300000) vs hw (e0000000 10000000)
[ 1.840841] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[ 1.841644] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300 0x1002:0x0B36 0xCA).
[ 1.842318] [drm] register mmio base: 0xFE900000
[ 1.842963] [drm] register mmio size: 262144
[ 1.843596] [drm] add ip block number 0 <vi_common>
[ 1.844212] [drm] add ip block number 1 <gmc_v8_0>
[ 1.844759] [drm] add ip block number 2 <tonga_ih>
[ 1.845217] [drm] add ip block number 3 <powerplay>
[ 1.845671] [drm] add ip block number 4 <dm>
[ 1.846127] [drm] add ip block number 5 <gfx_v8_0>
[ 1.846582] [drm] add ip block number 6 <sdma_v3_0>
[ 1.847041] [drm] add ip block number 7 <uvd_v6_0>
[ 1.847502] [drm] add ip block number 8 <vce_v3_0>
[ 1.847972] [drm] UVD is enabled in physical mode
[ 1.848432] [drm] VCE enabled in physical mode
...
[ 2.069835] [drm] GPU posting now...
[ 2.118648] ata10: SATA link down (SStatus 0 SControl 300)
[ 2.120130] ata9: SATA link down (SStatus 0 SControl 300)
[ 2.176069] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 2.176857] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 2.177627] amdgpu 0000:01:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[ 2.178406] [drm] Detected VRAM RAM=4096M, BAR=256M
[ 2.179176] [drm] RAM width 512bits HBM
[ 2.180338] [TTM] Zone kernel: Available graphics memory: 7183946 kiB
[ 2.181104] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 2.181871] [TTM] Initializing pool allocator
[ 2.182636] [TTM] Initializing DMA pool allocator
[ 2.183366] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.184059] [drm] amdgpu: 4096M of GTT memory ready.
[ 2.184679] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.185316] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 2.187346] [drm] Found UVD firmware Version: 1.87 Family ID: 12
[ 2.187926] [drm] UVD ENC is disabled
[ 2.189037] [drm] Found VCE firmware Version: 53.20 Binary ID: 3
[ 2.251725] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[ 2.252328] amdgpu: [powerplay] Error in phm_get_clock_info
[ 2.253210] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[ 2.254319] [drm] Display Core initialized with v3.1.59!
[ 2.255963] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.256630] [drm] Driver supports precise vblank timestamp query.
...
[ 2.286611] [drm] UVD initialized successfully.
[ 2.387627] [drm] VCE initialized successfully.
[ 2.389394] kfd kfd: Allocated 3969056 bytes on gart
[ 2.390194] Virtual CRAT table created for GPU
[ 2.390961] Parsing CRAT table with 1 nodes
[ 2.391749] Creating topology SYSFS entries
[ 2.392702] Topology: Add dGPU node [0x7300:0x1002]
[ 2.393521] kfd kfd: added device 1002:7300
[ 2.394288] [drm] Cannot find any crtc or sizes
[ 2.397053] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 0
[ 2.397649] checking generic (c0000000 300000) vs hw (c0000000 10000000)
[ 2.397649] fb: switching to amdgpudrmfb from EFI VGA
[ 2.398188] Console: switching to colour dummy device 80x25
[ 2.398349] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1458:0xD000 0xC6).
[ 2.398358] [drm] register mmio base: 0xFE400000
[ 2.398359] [drm] register mmio size: 524288
[ 2.398366] [drm] add ip block number 0 <soc15_common>
[ 2.398368] [drm] add ip block number 1 <gmc_v9_0>
[ 2.398369] [drm] add ip block number 2 <vega10_ih>
[ 2.398370] [drm] add ip block number 3 <psp>
[ 2.398371] [drm] add ip block number 4 <powerplay>
[ 2.398373] [drm] add ip block number 5 <dm>
[ 2.398374] [drm] add ip block number 6 <gfx_v9_0>
[ 2.398375] [drm] add ip block number 7 <sdma_v4_0>
[ 2.398377] [drm] add ip block number 8 <vcn_v1_0>
[ 2.398398] [drm] VCN decode is enabled in VM mode
[ 2.398399] [drm] VCN encode is enabled in VM mode
[ 2.398400] [drm] VCN jpeg decode is enabled in VM mode
[ 2.398402] vga_switcheroo: enabled
[ 2.403441] sda: sda1 sda2 sda3 sda4 sda5 sda6
[ 2.404308] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.418793] [drm] BIOS signature incorrect 0 0
[ 2.418811] ATOM BIOS: 113-RAVEN-111
[ 2.418834] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 2.418840] amdgpu 0000:0a:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 2.418842] amdgpu 0000:0a:00.0: GART: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
[ 2.418846] [drm] Detected VRAM RAM=2048M, BAR=2048M
[ 2.418848] [drm] RAM width 128bits DDR4
[ 2.418856] [drm] amdgpu: 2048M of VRAM memory ready
[ 2.418858] [drm] amdgpu: 3072M of GTT memory ready.
[ 2.418873] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.419033] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 2.420327] [drm] use_doorbell being set to: [true]
[ 2.420414] [drm] Found VCN firmware Version: 1.73 Family ID: 18
[ 2.420417] [drm] PSP loading VCN firmware
[ 2.584438] amdgpu: [powerplay] dpm has been enabled
[ 2.584540] [drm] DM_PPLIB: values for Invalid clock
[ 2.584544] [drm] DM_PPLIB: 0 in kHz
[ 2.584547] [drm] DM_PPLIB: 0 in kHz
[ 2.584550] [drm] DM_PPLIB: 0 in kHz
[ 2.584553] [drm] DM_PPLIB: 1600000 in kHz
[ 2.584681] WARNING: CPU: 1 PID: 203 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1372 dcn_bw_update_from_pplib+0x19a/0x2b0 [amdgpu]
[ 2.584687] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) hid_generic chash gpu_sched i2c_algo_bit ttm usbhid drm_kms_helper syscopyarea hid r8169 sysfillrect ahci sysimgblt fb_sys_fops libahci drm i2c_piix4 wmi gpio_amdpt gpio_generic video
[ 2.584718] CPU: 1 PID: 203 Comm: systemd-udevd Not tainted 4.19.13+ #1
[ 2.584722] Hardware name: Gigabyte Technology Co., Ltd. AB350N-Gaming WIFI/AB350N-Gaming WIFI-CF, BIOS F24 12/25/2018
[ 2.584822] RIP: 0010:dcn_bw_update_from_pplib+0x19a/0x2b0 [amdgpu]
[ 2.584827] Code: 84 fd 44 ff ff ff 49 8b 95 78 01 00 00 48 89 85 30 ff ff ff df ad 30 ff ff ff d8 f1 db 42 78 de c9 de ca de f9 d9 5a 4c eb 02 <0f> 0b 48 89 da be 04 00 00 00 4c 89 e7 e8 c4 47 fe ff 84 c0 74 32
[ 2.584835] RSP: 0018:ffffae66420676a0 EFLAGS: 00010246
[ 2.584841] RAX: 0000000000000001 RBX: ffffae6642067700 RCX: 0000000000000004
[ 2.584845] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246
[ 2.584849] RBP: ffffae6642067770 R08: 00000000000003e8 R09: 0720072007200720
[ 2.584854] R10: 0000000000000000 R11: 0720072007200720 R12: ffff96a42e517d80
[ 2.584858] R13: ffff96a420478000 R14: 0000000000000000 R15: ffff96a420478000
[ 2.584863] FS: 00007f87b29d0680(0000) GS:ffff96a430c40000(0000) knlGS:0000000000000000
[ 2.584868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.584872] CR2: 00007f87b23cf8ac CR3: 0000000422f26000 CR4: 00000000003406e0
[ 2.584877] Call Trace:
[ 2.584978] dcn10_create_resource_pool+0x7d4/0x9e0 [amdgpu]
[ 2.585073] dc_create_resource_pool+0x46/0x180 [amdgpu]
[ 2.585080] ? _cond_resched+0x19/0x40
[ 2.585086] ? __kmalloc+0x1d9/0x220
[ 2.585170] ? dal_gpio_service_create+0xa1/0x120 [amdgpu]
[ 2.585249] dc_create+0x221/0x620 [amdgpu]
[ 2.585254] ? kmem_cache_alloc_trace+0x42/0x1d0
[ 2.585333] dm_hw_init+0xc3/0x250 [amdgpu]
[ 2.585393] amdgpu_device_init+0xc92/0x1560 [amdgpu]
[ 2.585451] amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[ 2.585468] drm_dev_register+0x128/0x1b0 [drm]
[ 2.585524] amdgpu_pci_probe+0x148/0x200 [amdgpu]
[ 2.585532] local_pci_probe+0x47/0xa0
[ 2.585537] pci_device_probe+0x145/0x1b0
[ 2.585544] really_probe+0x268/0x3d0
[ 2.585550] driver_probe_device+0x11a/0x130
[ 2.585554] __driver_attach+0xe3/0x110
[ 2.585558] ? driver_probe_device+0x130/0x130
[ 2.585562] ? driver_probe_device+0x130/0x130
[ 2.585565] bus_for_each_dev+0x74/0xb0
[ 2.585568] ? kmem_cache_alloc_trace+0x1b1/0x1d0
[ 2.585571] driver_attach+0x1e/0x20
[ 2.585574] bus_add_driver+0x167/0x260
[ 2.585577] ? 0xffffffffc078c000
[ 2.585579] driver_register+0x60/0x100
[ 2.585582] ? 0xffffffffc078c000
[ 2.585585] __pci_register_driver+0x5a/0x60
[ 2.585630] amdgpu_init+0x7a/0x89 [amdgpu]
[ 2.585634] do_one_initcall+0x4a/0x1c9
[ 2.585637] ? __vunmap+0x8e/0xc0
[ 2.585640] ? _cond_resched+0x19/0x40
[ 2.585642] ? kmem_cache_alloc_trace+0x42/0x1d0
[ 2.585645] ? vfree+0x35/0x70
[ 2.585648] do_init_module+0x5f/0x216
[ 2.585652] load_module+0x21b6/0x2aa0
[ 2.585656] __do_sys_finit_module+0xfc/0x120
[ 2.585659] ? __do_sys_finit_module+0xfc/0x120
[ 2.585663] __x64_sys_finit_module+0x1a/0x20
[ 2.585666] do_syscall_64+0x5a/0x120
[ 2.585670] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2.585672] RIP: 0033:0x7f87b24da839
[ 2.585675] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[ 2.585680] RSP: 002b:00007ffec49fcaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.585683] RAX: ffffffffffffffda RBX: 000056526f6ead10 RCX: 00007f87b24da839
[ 2.585685] RDX: 0000000000000000 RSI: 00007f87b21b9145 RDI: 0000000000000015
[ 2.585688] RBP: 00007f87b21b9145 R08: 0000000000000000 R09: 00007ffec49fcbc0
[ 2.585690] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
[ 2.585693] R13: 000056526f6f8630 R14: 0000000000020000 R15: 000056526f6ead10
[ 2.585696] ---[ end trace e7fd274823174d5d ]---
[ 2.585698] [drm] DM_PPLIB: values for Invalid clock
[ 2.585700] [drm] DM_PPLIB: 300000 in kHz
[ 2.585702] [drm] DM_PPLIB: 600000 in kHz
[ 2.585704] [drm] DM_PPLIB: 626000 in kHz
[ 2.585706] [drm] DM_PPLIB: 654000 in kHz
[ 2.585917] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:1! type 0 expected 3
[ 2.585965] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3
[ 2.586859] ata2: SATA link down (SStatus 0 SControl 300)
[ 2.625728] [drm] Display Core initialized with v3.1.59!
[ 2.651167] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.651169] [drm] Driver supports precise vblank timestamp query.
[ 2.674382] [drm] VCN decode and encode initialized successfully.
[ 2.675671] kfd kfd: Allocated 3969056 bytes on gart
[ 2.675695] Topology: Add dGPU node [0x7300:0x1002]
[ 2.675929] kfd kfd: added device 1002:15dd
[ 2.677911] [drm] fb mappable at 0x41100000
[ 2.677913] [drm] vram apper at 0x40000000
[ 2.677915] [drm] size 14745600
[ 2.677916] [drm] fb depth is 24
[ 2.677917] [drm] pitch is 10240
[ 2.677989] fbcon: amdgpudrmfb (fb0) is primary device
[ 2.713278] Console: switching to colour frame buffer device 320x90
[ 2.737430] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[ 2.752115] amdgpu 0000:0a:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[ 2.752155] amdgpu 0000:0a:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[ 2.752197] amdgpu 0000:0a:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[ 2.752238] amdgpu 0000:0a:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[ 2.752280] amdgpu 0000:0a:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[ 2.752321] amdgpu 0000:0a:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[ 2.752363] amdgpu 0000:0a:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[ 2.752396] amdgpu 0000:0a:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[ 2.752423] amdgpu 0000:0a:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[ 2.752450] amdgpu 0000:0a:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[ 2.752477] amdgpu 0000:0a:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[ 2.752503] amdgpu 0000:0a:00.0: ring 11(vcn_dec) uses VM inv eng 5 on hub 1
[ 2.752529] amdgpu 0000:0a:00.0: ring 12(vcn_enc0) uses VM inv eng 6 on hub 1
[ 2.752555] amdgpu 0000:0a:00.0: ring 13(vcn_enc1) uses VM inv eng 7 on hub 1
[ 2.752582] amdgpu 0000:0a:00.0: ring 14(vcn_jpeg) uses VM inv eng 8 on hub 1
[ 2.756280] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0a:00.0 on minor 1
...
Thanks for testing. This is one step closer. I'll apply this patch to our staging branch. Can I add a "Tested-by: " tag with your email address to the patch?
I suspect clinfo is tripping over the issues regarding different memory models I mentioned before. In theory, and unlike older APUs, Raven should be able to work together with a dGPU by making it manage its memory like a dGPU if another dGPU is present.
I have a Raven box at home. Time to drop in a dGPU for some experiments. It'll probably have to wait till February, though, as I'm about to go on vacation.
Thank you for you efforts. Yes, you may add my email in the tag. I'd be happy to have a credit. Let me know if you cannot find it through github.
Let me know in case you need anything else.
Github shows me a link to your website, which give me your di.uoa.gr email address. Let me know if you want me to use a different address.
Better use the alternative email address that I also use on github. Sent directly via email.
I also tried out your patch, ending up with the following dmesg output. rocminfo segaults. Any ideas on that?
[ 8.719254] [drm] amdgpu kernel modesetting enabled.
[ 8.719275] vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
[ 8.719340] ATPX version 1, functions 0x00000000
[ 8.720271] Parsing CRAT table with 1 nodes
[ 8.720274] Creating topology SYSFS entries
[ 8.720295] Topology: Add APU node [0x0:0x0]
[ 8.720295] Finished initializing topology
[ 8.720369] checking generic (e0000000 300000) vs hw (e0000000 10000000)
[ 8.720369] fb0: switching to amdgpudrmfb from EFI VGA
[ 8.720406] Console: switching to colour dummy device 80x25
[ 8.720683] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE387 0xE7).
[ 8.720699] [drm] register mmio base: 0xFE900000
[ 8.720700] [drm] register mmio size: 262144
[ 8.720715] [drm] add ip block number 0 <vi_common>
[ 8.720716] [drm] add ip block number 1 <gmc_v8_0>
[ 8.720717] [drm] add ip block number 2 <tonga_ih>
[ 8.720717] [drm] add ip block number 3 <gfx_v8_0>
[ 8.720718] [drm] add ip block number 4 <sdma_v3_0>
[ 8.720719] [drm] add ip block number 5 <powerplay>
[ 8.720720] [drm] add ip block number 6 <dm>
[ 8.720721] [drm] add ip block number 7 <uvd_v6_0>
[ 8.720721] [drm] add ip block number 8 <vce_v3_0>
[ 8.720745] [drm] UVD is enabled in VM mode
[ 8.720745] [drm] UVD ENC is enabled in VM mode
[ 8.720748] [drm] VCE enabled in VM mode
[ 8.720934] amdgpu 0000:10:00.0: No more image in the PCI ROM
[ 8.720955] ATOM BIOS: 113-1E3870U-O49
[ 8.720982] [drm] vm size is 128 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 8.721466] amdgpu 0000:10:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 8.721468] amdgpu 0000:10:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 8.721476] [drm] Detected VRAM RAM=8192M, BAR=256M
[ 8.721477] [drm] RAM width 256bits GDDR5
[ 8.721536] [TTM] Zone kernel: Available graphics memory: 16343920 kiB
[ 8.721537] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 8.721538] [TTM] Initializing pool allocator
[ 8.721542] [TTM] Initializing DMA pool allocator
[ 8.721709] [drm] amdgpu: 8192M of VRAM memory ready
[ 8.721711] [drm] amdgpu: 8192M of GTT memory ready.
[ 8.721729] [drm] GART: num cpu pages 65536, num gpu pages 65536
[ 8.725431] [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
[ 8.726075] [drm] Chained IB support enabled!
[ 8.729279] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[ 8.730753] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[ 8.802917] [drm] DM_PPLIB: values for Engine clock
[ 8.802918] [drm] DM_PPLIB: 300000
[ 8.802918] [drm] DM_PPLIB: 600000
[ 8.802919] [drm] DM_PPLIB: 900000
[ 8.802919] [drm] DM_PPLIB: 1145000
[ 8.802919] [drm] DM_PPLIB: 1215000
[ 8.802920] [drm] DM_PPLIB: 1257000
[ 8.802920] [drm] DM_PPLIB: 1300000
[ 8.802920] [drm] DM_PPLIB: 1366000
[ 8.802921] [drm] DM_PPLIB: Validation clocks:
[ 8.802922] [drm] DM_PPLIB: engine_max_clock: 136600
[ 8.802922] [drm] DM_PPLIB: memory_max_clock: 200000
[ 8.802923] [drm] DM_PPLIB: level : 8
[ 8.802924] [drm] DM_PPLIB: values for Memory clock
[ 8.802924] [drm] DM_PPLIB: 300000
[ 8.802924] [drm] DM_PPLIB: 1000000
[ 8.802925] [drm] DM_PPLIB: 2000000
[ 8.802925] [drm] DM_PPLIB: Validation clocks:
[ 8.802926] [drm] DM_PPLIB: engine_max_clock: 136600
[ 8.802926] [drm] DM_PPLIB: memory_max_clock: 200000
[ 8.802926] [drm] DM_PPLIB: level : 8
[ 8.828097] [drm] Display Core initialized with v3.2.11!
[ 8.851442] [drm] SADs count is: -2, don't need to read it
[ 8.911710] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 8.911711] [drm] Driver supports precise vblank timestamp query.
[ 8.938350] [drm] UVD and UVD ENC initialized successfully.
[ 9.038304] [drm] VCE initialized successfully.
[ 9.039355] kfd kfd: Allocated 3969056 bytes on gart
[ 9.039375] Virtual CRAT table created for GPU
[ 9.039376] Parsing CRAT table with 1 nodes
[ 9.039384] Creating topology SYSFS entries
[ 9.039493] Topology: Add dGPU node [0x67df:0x1002]
[ 9.039571] kfd kfd: added device 1002:67df
[ 9.042687] [drm] fb mappable at 0xE0830000
[ 9.042689] [drm] vram apper at 0xE0000000
[ 9.042689] [drm] size 14745600
[ 9.042690] [drm] fb depth is 24
[ 9.042690] [drm] pitch is 10240
[ 9.042771] fbcon: amdgpudrmfb (fb0) is primary device
[ 9.105048] Console: switching to colour frame buffer device 240x75
[ 9.126058] amdgpu 0000:10:00.0: fb0: amdgpudrmfb frame buffer device
[ 9.149999] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:10:00.0 on minor 0
[ 9.150075] amdgpu 0000:38:00.0: enabling device (0000 -> 0003)
[ 9.150180] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1462:0x7B79 0xC6).
[ 9.150189] [drm] register mmio base: 0xFE300000
[ 9.150189] [drm] register mmio size: 524288
[ 9.150198] [drm] add ip block number 0 <soc15_common>
[ 9.150198] [drm] add ip block number 1 <gmc_v9_0>
[ 9.150199] [drm] add ip block number 2 <vega10_ih>
[ 9.150199] [drm] add ip block number 3 <psp>
[ 9.150200] [drm] add ip block number 4 <gfx_v9_0>
[ 9.150200] [drm] add ip block number 5 <sdma_v4_0>
[ 9.150201] [drm] add ip block number 6 <powerplay>
[ 9.150201] [drm] add ip block number 7 <dm>
[ 9.150202] [drm] add ip block number 8 <vcn_v1_0>
[ 9.150489] [drm] VCN decode is enabled in VM mode
[ 9.150489] [drm] VCN encode is enabled in VM mode
[ 9.150489] [drm] VCN jpeg decode is enabled in VM mode
[ 9.150493] vga_switcheroo: enabled
[ 9.159283] [drm] BIOS signature incorrect ff ff
[ 9.159286] amdgpu 0000:38:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[ 9.163882] [drm] BIOS signature incorrect ff ff
[ 9.163942] [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
[ 9.163969] amdgpu 0000:38:00.0: Fatal error during GPU init
[ 9.163970] [drm] amdgpu: finishing device.
[ 9.163979] vga_switcheroo: disabled
[ 9.164208] amdgpu: probe of 0000:38:00.0 failed with error -22
[ 9.967063] amdgpu 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Hi @koerberm your problem looks different. The graphics driver is having trouble initializing your Raven integrated GPU because it can't find the BIOS ROM. I'm quite sure you'd see the same problem without my patch.
Hi @fxkamd I found out what myproblem was: I configured the dGPU to be the primary display adapter in BIOS. When setting the iGPU to primary, the topology is discovered correctly! Many thanks for that. Any idea where to report the issue when using the dGPU as primary?
You could report it on the [email protected] mailing list.
A question:
> ll /dev/dri/
card0
card1
renderD128
renderD129
Wy is /dev/kfd unique...
Sur make iGPU+dGPU work together is difficult... Is it to make us wait possible to chose (kernel boot param?) if rocm/kfd has to use iGPU or dGPU... (kfd.igpu=0 ... kfd.dgpu=1)
Hmm... looks like there isn't enough information in the email address for github to pick up an email response... so duplicating here:
/dev/kfd is unique because it has to handle cross-GPU functionality that is not readily available via the single-GPU drm APIs, ie providing a single shared address space across all the CPUs and GPUs in the system. There is a single instance of KFD for the entire system, but a separate instance of amdgpu for each GPU in the system.
The challenge with making iGPUs and dGPUs work together is that HSA iGPUs (like Kaveri/Carrizo/Raven) are able to take advantage of some HW capabilities not yet broadly available in dGPUs, eg the ability to run with fully unpinned memory via the iGPUs IOMMUv2, so we end up with significantly different memory management models between iGPU and dGPU. On dGPUs all memory is pinned today but we can simulate having unpinned memory via the eviction mechanism.
We could "dumb down" the iGPU capability to match what we can do on all the dGPUs but that seems like a step in the wrong direction since newer dGPUs are starting to pick up the HW capabilities to run with unpinned memory as well.
Thanks for this reply. And yes as a developer I totally agree: don't "dumb down" the iGPU. It is the wrong direction!
question:
- can you tell us with dGPU HW will be capable to use unpinned memory
- the reason for have unique /dev/dfk is only to have cross-GPU function
- is it possible to have 2 /dev/dfk, one for GPU with pinned mem and one for unpinned...
WOW! =:-D I have been waiting to use Linux/OpenCL/ROCm on an iGPU/APU for over a year I will build an APU only system next month: Ubuntu 18.04, latest kernel, OpenCL, ROCm
We could "dumb down" the iGPU capability to match what we can do on all the dGPUs but that seems like a step in the wrong direction since newer dGPUs are starting to pick up the HW capabilities to run with unpinned memory as well.
Please don't dumb down the iGPU. My objective is to benchmark shared CPU/GPU memory with OpenCL & ROCm. I believe that low-latency shared memory can be a real asset for speeding up CPU code with small GPU accelerated kernels. I want the fastest shared memory possible, and the lowest latency when switching from CPU to iGPU and back.
Keep up the great work!
I have a path to allow the user to chose to use iGPU or dGPU with rocm without need to deactivate graphics. the path is made again kernel 5.0.5 but hop it work with other to... I add a kernel param : "amdgpu.rocm_mode=N" N = 1 => use memory management models for iGPU (deactivate dGPU in kfd..) N = 2 => use memory management models for dGPU (deactivate iGPU/APU in kfd..) N = 3 => (the default) keep the first one found (for now: iGPU/APU if exist... dGPU on other CPU) kfd-iGPU-dGPU_00.patch.txt Most work with my config: "AMD Ryzen 5 3550H" + "Radeon RX560X" :
- rocinfo wok in all case
- clinfo + darktable with N=1&3 (ie RAVEN...) work
- clinfo crach with N=2 (RX560...) but the backtrack is strange: memory error in glibc...
If anyone can test and get feedback...
OK more news. (pretty good.) In fact i have libhsa-runtime-image64.so.1 installed for use with opencl on darktable. it work great with the RAVEN APU... If I remove it clinfo work with the RX560 too.
new patch. most cleaning and comment...
0003-allows-to-choose-iGPU-or-dGPU-memory-management-mode.patch.txt
I also had the same issue, and the patch 0003 above is working for me. I have a Biostar X470GTN motherboard, Ryzen 5 2400G with Vega 11 graphics, and a Gigabyte RX570 4GB dGPU. I patched kernel 5.0.7 and booted with amdgpu.rocm_mode=2 and without, and clinfo showed iGPU and dGPU only, respectively. I tested with hashcat on the dGPU, the benchmark has warnings on some of the hashmodes, but I think that they are just hashcat's openCL code issues.
new patch release for kernel 5.1.1 (re-base only no change...) 0002-allows-to-choose-iGPU-or-dGPU-memory-management-mode.patch.txt
new patch for kernel 5.2.8 (re-base only no change...) 0002-allows-to-choose-iGPU-or-dGPU-memory-management-mode.patch.txt
The next patch is not needed it only help see if rocm is deactivate because of GPU is suspend (add trace on dmesg output...) 0001-add-kernel-message-on-kfd-suspend-resume.patch.txt
update patch for kernel 5.3... buld test on 5.3-rc8 (last path don't apply on kernel 5.3)
0002-add-kernel-message-on-kfd-suspend-resume.patch.txt 0003-allows-to-choose-iGPU-or-dGPU-memory-management-mode.patch.txt
update patch for kernel 5.6.y
0002-add-kernel-message-on-kfd-suspend-resume.patch.txt 0003-Allows-to-choose-iGPU-or-dGPU-memory-management-mode.patch.txt
(test with tensorflow: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/935)