Intel Arc A770: Kernel panic on kldload i915kms.ko
Describe the bug From using a drm-kmod build from efd91670c8e0a498f5af9faeb9f3cb4df5f813be, the i915kms driver kernel panics when using an (Acer Predator BiFrost) Intel Arc A770 graphics card. And the kernel panic still persists even when using an Intel onboard GPU with the same graphics card installed.
FreeBSD version
FreeBSD freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT #0 main-n271909-28294dc92476: Fri Aug 30 08:28:17 PDT 2024 root@freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1500023 1500023
DRM KMOD version
My own "custom derived" graphics/drm-66-kmod port with GH_TAGNAME pointing to efd91670c8e0a498f5af9faeb9f3cb4df5f813be.
Also git-clone(1) from linux-firmware and copied all of the i915/dg2_* firmware bins to /boot/modules and renamed them appropriately to match the filename style there.
To Reproduce
Boot into the system with either i915kms using kld_list inside /etc/rc.conf or kldload it manually.
Additional context core.txt.0 dump
My bad, I didn't see about the "Intel DG2 GUC/HUC support" not being implemented yet from PR https://github.com/freebsd/drm-kmod/pull/283.
~~Closing therefore.~~
I'm reopening this to have it serve as a milestone issue.
I figured why have it be closed anyway since the issue is still valid and my reporting could be useful?
dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/dg2_dmc_ver2_08.bin either i915/dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/i915/dg2_dmc_ver2_08.bin either i915_dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08.bin either i915_dg2_dmc_ver2_08_bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08_bin either i915_dg2_dmc_ver2_08_bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08_bin either drmn0: could not load firmware image 'i915/dg2_dmc_ver2_08.bin' drmn0: [drm] Failed to load DMC firmware i915/dg2_dmc_ver2_08.bin. Disabling runtime power management. drmn0: [drm] Run pkg install gpu-firmware-kmod to install it
You may start with adding dg2_dmc_ver2_08.bin to firmwares
But I doubt that it will help
I'll try that out and report back.
I also updated my bug description to be more specific.
But I doubt that it will help
And you're right, it didn't.
After trying a couple more ideas, I spent a good amount of time re-learning how to create a new core dump of the kernel panic with the firmware(s) loaded. Sorry for the delay.
After taking a look at the code around faulted line, I have got an impression that it can happen due to missing vmap_pfn() implementation. It is rather easy to check. Just with replacing of return NULL; line in i915_gem_object_map_pfn() function located in drivers/gpu/drm/i915/gem/i915_gem_pages.c file of drm-kmod with panic("oops"); or return ERR_PTR(-ENOSUP);
It seems you're correct about the missing vmap_pfn() implementation. It triggered an "oops" panic by using the panic("oops"); based on your instructions.
What I did was created this custom patch to put into my /usr/ports/graphics/drm-66-kmod/files:
--- drivers/gpu/drm/i915/gem/i915_gem_pages.c.orig
+++ drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -329,7 +329,7 @@ static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
{
#ifdef __FreeBSD__
// BSDFIXME: Need vmap_pfn() implementation.
- return NULL;
+ panic("oops");
#else
resource_size_t iomap = obj->mm.region->iomap.base -
obj->mm.region->region.start;
And rebuilt and reinstalled the package of my derived port. Then I did my usual testing and grabbed a new core dump which shows the "oops" panic. Yay! \o/
You may try following patches (only compile-tested). It is just quick conversion of vmap() implementation to vmap_pfn() through replacement struct page with page frame number
FreeBSD:
Incomplete patch deleted. See patch three messages below
and drm-kmod:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 931e7f46733..0ba955611df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -327,10 +327,6 @@ static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
enum i915_map_type type)
{
-#ifdef __FreeBSD__
- // BSDFIXME: Need vmap_pfn() implementation.
- return NULL;
-#else
resource_size_t iomap = obj->mm.region->iomap.base -
obj->mm.region->region.start;
unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
@@ -356,7 +352,6 @@ static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
kvfree(pfns);
return vaddr ?: ERR_PTR(-ENOMEM);
-#endif
}
/* get, pin, and map the pages of the object into kernel space */
@wulf7 core.txt.0
I guess I should note that after doing a lot of experimentation, there was one time I was "lucky" enough at random to get i915kms.ko to load with my Intel Arc but I was noticing this warning at the time:
drmn0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
But that only happened once and it wasn't reproducible.
Also, I don't know if I need to get another coredump by using the following in /boot/loader.conf:
hw.i915kms.enable_guc="2"
compat.linuxkpi.i915_disable_power_well="0"
because I changed the first tunable to 1 and commented out the second one. I needed both as above to allow myself to load i915kms on my integrated Intel GPU and get back into X11.
But right now, I need some rest, because I was doing on like +20 kernel-panic reboots of experiments.
Next version of FreeBSD patch:
diff --git a/sys/compat/linuxkpi/common/include/linux/vmalloc.h b/sys/compat/linuxkpi/common/include/linux/vmalloc.h
index 00650a2df9b..30f7e0e6297 100644
--- a/sys/compat/linuxkpi/common/include/linux/vmalloc.h
+++ b/sys/compat/linuxkpi/common/include/linux/vmalloc.h
@@ -35,8 +35,11 @@
#define VM_MAP 0x0000
#define PAGE_KERNEL 0x0000
+#define vmap_pfn(...) lkpi_vmap_pfn(__VA_ARGS__)
+
void *vmap(struct page **pages, unsigned int count, unsigned long flags,
int prot);
+void *lkpi_vmap_pfn(unsigned long *pfns, unsigned int count, int prot);
void vunmap(void *addr);
#endif /* _LINUXKPI_LINUX_VMALLOC_H_ */
diff --git a/sys/compat/linuxkpi/common/src/linux_compat.c b/sys/compat/linuxkpi/common/src/linux_compat.c
index 81d24603d1d..bce3af61516 100644
--- a/sys/compat/linuxkpi/common/src/linux_compat.c
+++ b/sys/compat/linuxkpi/common/src/linux_compat.c
@@ -60,6 +60,9 @@
#include <vm/vm_page.h>
#include <vm/vm_pager.h>
+#include <vm/uma.h>
+#include <vm/uma_int.h>
+
#include <machine/stdarg.h>
#if defined(__i386__) || defined(__amd64__)
@@ -1804,6 +1807,24 @@ vmmap_remove(void *addr)
return (vmmap);
}
+int
+is_vmalloc_addr(const void *addr)
+{
+ struct vmmap *vmmap;
+ uintptr_t p = (uintptr_t)addr;
+
+ mtx_lock(&vmmaplock);
+ LIST_FOREACH(vmmap, &vmmaphead[VM_HASH(addr)], vm_next)
+ if (p >= trunc_page(vmmap->vm_addr) &&
+ p < round_page((char *)vmmap->vm_addr + vmmap->vm_size))
+ break;
+ mtx_unlock(&vmmaplock);
+ if (vmmap != NULL)
+ return(1);
+
+ return (vtoslab((vm_offset_t)addr & ~UMA_SLAB_MASK) != NULL);
+}
+
#if defined(__i386__) || defined(__amd64__) || defined(__powerpc__) || defined(__aarch64__) || defined(__riscv)
void *
_ioremap_attr(vm_paddr_t phys_addr, unsigned long size, int attr)
@@ -1849,6 +1870,58 @@ vmap(struct page **pages, unsigned int count, unsigned long flags, int prot)
return ((void *)off);
}
+#ifdef __amd64__
+static void
+_lkpi_pmap_qenter_pfn(vm_offset_t sva, vm_pindex_t *pi, int count,
+ vm_memattr_t mode)
+{
+ pt_entry_t *endpte, oldpte, pa, *pte;
+ vm_pindex_t p;
+ int cache_bits;
+ pt_entry_t pg_g;
+
+ pg_g = pti ? 0 : X86_PG_G;
+ oldpte = 0;
+ pte = vtopte(sva);
+ endpte = pte + count;
+ cache_bits = pmap_cache_bits(kernel_pmap, mode, false);
+ while (pte < endpte) {
+ p = *pi++;
+ pa = IDX_TO_OFF(p) | cache_bits;
+ if ((*pte & (PG_FRAME | X86_PG_PTE_CACHE)) != pa) {
+ oldpte |= *pte;
+ pte_store(pte, pa | pg_g | pg_nx | X86_PG_A |
+ X86_PG_M | X86_PG_RW | X86_PG_V);
+ }
+ pte++;
+ }
+ if (__predict_false((oldpte & X86_PG_V) != 0))
+ pmap_invalidate_range(kernel_pmap, sva, sva + count *
+ PAGE_SIZE);
+}
+#endif
+
+void *
+lkpi_vmap_pfn(unsigned long *pfns, unsigned int count, int prot)
+{
+#ifdef __amd64__
+ vm_offset_t off;
+ size_t size;
+
+ size = count * PAGE_SIZE;
+ off = kva_alloc(size);
+ if (off == 0)
+ return (NULL);
+ vmmap_add((void *)off, size);
+ _lkpi_pmap_qenter_pfn(off, pfns, count, pgprot2cachemode(prot));
+
+ return ((void *)off);
+#else
+ panic("vmap_pfn is not implemented");
+ return (NULL);
+#endif
+}
+
void
vunmap(void *addr)
{
diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c
index 25243382f9e..1ed8b99cdf3 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -53,9 +53,6 @@
#include <vm/vm_reserv.h>
#include <vm/vm_extern.h>
-#include <vm/uma.h>
-#include <vm/uma_int.h>
-
#include <linux/gfp.h>
#include <linux/mm.h>
#include <linux/preempt.h>
@@ -287,12 +284,6 @@ lkpi_get_user_pages(unsigned long start, unsigned long nr_pages,
!!(gup_flags & FOLL_WRITE), pages));
}
-int
-is_vmalloc_addr(const void *addr)
-{
- return (vtoslab((vm_offset_t)addr & ~UMA_SLAB_MASK) != NULL);
-}
-
vm_fault_t
lkpi_vmf_insert_pfn_prot_locked(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, pgprot_t prot)
Will get on it, thanks! 😎
@wulf7 core.txt.1
It is strange. This panic looks like no patches have peen applied. Did you apply drm-kmod patch?
I still kept that same patch in the files directory of my drm-kmod port like last time. But I rebuilt the package in an updated poudriere jail using a non-clean build with your latest patch.
I might want to try this over again with a clean FreeBSD build and redo my steps with more care.
Actually, you were spot on. I was moving the patch around outside of the files directory and it didn't get applied for the second round. My apologies.
Gonna do a clean build anyway. 😄
Firmware loaded successfully this time.
But unfortunately I have no idea how to debug GPU hangs.
Understood. And I appreciate all of your help here regardless. Thanks for working on those patches.
Okay, I got good news and bad news.
Good news:
I got past the GPU hangs and successfully loaded the i915kms driver in tty with just the following in /boot/loader.conf:
hw.i915kms.modeset="1"
Bad news: X11 never starts and it's likely because of the following errors:
Nov 18 06:41:56 freebsd kernel: drmn0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
Nov 18 06:41:56 freebsd kernel: drmn0: [drm] *ERROR* GT0: Enabling uc failed (-5)
Nov 18 06:41:56 freebsd kernel: drmn0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Nov 18 06:41:56 freebsd kernel: drmn0: [drm:0xffffffff83f34660s] 0xfffffe0229493808Vsysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!
It's crucial to be able to use hw.i915kms.enable_guc="2" for the Intel Arc, but sadly with it enabled (even with the "1" and "3" values) it always kernel panics. So I'm in a catch-22 here.
GUC/HUC is not supported by drm-kmod on DG2 yet. It requires porting of MEI and PXP drivers
Ah, yeah, that's right. Thanks for the reminder there.
Hi!
Any advance here?
I have a A380 in 15.0-CURRENT with drm-66-kmod, and i915kms hangs on load.
Update: The current gpu-firmware-kmod must be updated to include the latest firmware versions for Intel Arc, in this case version dg2_dmc_ver2_08.bin, otherwise the firmware loading fails.
The rest of the behavior remains the same, with a hang when loading the i915kms module.