X11 unusably slow with DRM 5.15 and 6.1's amdgpu on a RX 800
Title: X11 unusably slow with DRM 5.15 and 6.1's amdgpu on a RX 800
Description and reproduction
With DRM 5.15 running with an AMD RX 800 card, after a few minutes to hours in a X11 session, just clicking on a program in the task bar to switch to it or using Alt-Tab can freeze the whole display during seconds. Generally speaking, any kind of desktop effect (such as application thumbnail display when hovering over the task bar) is slow. As the uptime progresses, freezes tend to last longer (I've measured that a few of them lasted for almost 10 minutes).
DRM 6.1 has the same problem but in a slightly lighter form: It takes more uptime for the problem to start manifesting, and freezes are initially shorter. However, they increase over time to the point that the desktop eventually becomes almost unusable, as for 5.15.
DRM 5.10 works correctly. This problem also doesn't show up on some laptop using Intel Gen10 integrated graphics (driver i915) with DRM 5.15 and DRM 6.1 (although another problem shows up, to be reported separately).
Tested mostly with KDE/KWin, but Xfce has similar problems. Turning off composition in KWin essentially doesn't solve the problem (barely makes an almost imperceptible difference).
System Information
FreeBSD version
FreeBSD 14.1-STABLE #0 n267671-9a8a26aefb36: Mon May 13 13:39:56 CEST 2024 MYCONFIG 1401500 1401500 Kernel MYCONFIG is a stripped-down version of GENERIC close to MINIMAL.
Same problem on an older version: FreeBSD 14.0-STABLE #1 n266865-245844372d7e: Thu Feb 22 11:11:45 CET 2024
PCI Info
vgapci0@pci0:8:0:0: class=0x030000 rev=0xe7 hdr=0x00 vendor=0x1002 device=0x67df subvendor=0x1043 subdevice=0x0525 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]' class = display subclass = VGA
DRM KMOD version
Problem reproduced with: drm-515-kmod 5.15.118_4 drm-61-kmod 6.1.69_2
No problem with: drm-510-kmod 5.10.163_9
Preliminary Investigation
Before clear, long freezes, it is common to observe, after some uptime, Xorg using ~5% CPU for several seconds or more. Some captured kernel stacks:
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread pmap_page_set_memattr+0x5b lkpi_vmf_insert_pfn_prot_locked+0x268 ttm_bo_vm_fault_reserved+0x1c1 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread drm_sched_entity_select_rq+0x6e drm_sched_job_init+0x1c amdgpu_job_submit+0x22 amdgpu_vm_sdma_commit+0xe2 amdgpu_vm_sdma_update+0x18c amdgpu_vm_bo_update_mapping+0x952 amdgpu_vm_clear_freed+0xd9 amdgpu_gem_va_update_vm+0x30 amdgpu_gem_va_ioctl+0x251 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread lkpi_vmf_insert_pfn_prot_locked+0x268 ttm_bo_vm_fault_reserved+0x2b6 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread ttm_pool_free+0x110 ttm_tt_destroy_common+0x25 amdgpu_ttm_backend_destroy+0x1d ttm_bo_put+0x331 amdgpu_bo_unref+0x1a amdgpu_gem_object_free+0x1b drm_gem_handle_delete+0xc2 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
The frequency of the stacks with ttm_pool_free() at the bottom increases as small glitches and freezes appear (not reproduced below). Other stacks that were captured much more rarely:
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread cdev_pager_lookup+0x38 lkpi_unmap_mapping_range+0x16 ttm_bo_handle_move_mem+0x7a ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread pmap_remove_ptes+0xdc pmap_remove1+0x55f vm_map_delete+0x19f kern_munmap+0x8a amd64_syscall+0x120 fast_syscall_common+0xf8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread vm_page_insert+0x1f lkpi_vmf_insert_pfn_prot_locked+0x293 ttm_bo_vm_fault_reserved+0x2b6 amdgpu_gem_fault+0x86 linux_cdev_pager_populate+0x128 vm_fault_allocate+0x39f vm_fault+0x3c6 vm_fault_trap+0x4c trap_pfault+0x1bd trap+0x405 calltrap+0x8
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread cdev_pager_lookup+0x38 lkpi_unmap_mapping_range+0x16 ttm_bo_handle_move_mem+0x7a ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
Finally, outputting kernel stack traces every 0.1s during freezes seems to indicate that the process is stuck in (or repeatedly calling):
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread vm_page_find_contig_domain+0x8f vm_page_alloc_noobj_contig_domain+0x73 vm_page_reclaim_contig_domain_ext+0x8f0 vm_page_reclaim_contig+0x5c linux_alloc_pages+0x8d ttm_pool_alloc+0x2bb ttm_tt_populate+0xc3 ttm_bo_handle_move_mem+0xc3 ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113
This stack also appears, although less frequently (it's in fact a sub-stack of the previous):
PID TID COMM TDNAME KSTACK
1956 101446 Xorg MainThread vm_page_reclaim_contig+0x5c linux_alloc_pages+0x8d ttm_pool_alloc+0x2bb ttm_tt_populate+0xc3 ttm_bo_handle_move_mem+0xc3 ttm_bo_validate+0xb4 ttm_bo_init_reserved+0x1a2 amdgpu_bo_create+0x1e1 amdgpu_bo_create_user+0x21 amdgpu_gem_create_ioctl+0x1f8 drm_ioctl_kernel+0xc6 drm_ioctl+0x2ae linux_file_ioctl+0x269 kern_ioctl+0x25b sys_ioctl+0x113 amd64_syscall+0x120 fast_syscall_common+0xf8
(Part of the time spent on this report was sponsored by the FreeBSD Foundation.)