drm-kmod icon indicating copy to clipboard operation
drm-kmod copied to clipboard

Intel Arc A770: Kernel panic on kldload i915kms.ko

Open kenrap opened this issue 1 year ago • 22 comments

Describe the bug From using a drm-kmod build from efd91670c8e0a498f5af9faeb9f3cb4df5f813be, the i915kms driver kernel panics when using an (Acer Predator BiFrost) Intel Arc A770 graphics card. And the kernel panic still persists even when using an Intel onboard GPU with the same graphics card installed.

FreeBSD version

FreeBSD freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT #0 main-n271909-28294dc92476: Fri Aug 30 08:28:17 PDT 2024     root@freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1500023 1500023

DRM KMOD version My own "custom derived" graphics/drm-66-kmod port with GH_TAGNAME pointing to efd91670c8e0a498f5af9faeb9f3cb4df5f813be.

Also git-clone(1) from linux-firmware and copied all of the i915/dg2_* firmware bins to /boot/modules and renamed them appropriately to match the filename style there.

To Reproduce Boot into the system with either i915kms using kld_list inside /etc/rc.conf or kldload it manually.

Additional context core.txt.0 dump

kenrap avatar Aug 31 '24 03:08 kenrap

My bad, I didn't see about the "Intel DG2 GUC/HUC support" not being implemented yet from PR https://github.com/freebsd/drm-kmod/pull/283.

~~Closing therefore.~~

I'm reopening this to have it serve as a milestone issue.

I figured why have it be closed anyway since the issue is still valid and my reporting could be useful?

kenrap avatar Aug 31 '24 03:08 kenrap

dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/dg2_dmc_ver2_08.bin either i915/dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/i915/dg2_dmc_ver2_08.bin either i915_dg2_dmc_ver2_08.bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08.bin either i915_dg2_dmc_ver2_08_bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08_bin either i915_dg2_dmc_ver2_08_bin: could not load binary firmware /boot/firmware/i915_dg2_dmc_ver2_08_bin either drmn0: could not load firmware image 'i915/dg2_dmc_ver2_08.bin' drmn0: [drm] Failed to load DMC firmware i915/dg2_dmc_ver2_08.bin. Disabling runtime power management. drmn0: [drm] Run pkg install gpu-firmware-kmod to install it

You may start with adding dg2_dmc_ver2_08.bin to firmwares

wulf7 avatar Oct 02 '24 08:10 wulf7

But I doubt that it will help

wulf7 avatar Oct 02 '24 08:10 wulf7

I'll try that out and report back.

I also updated my bug description to be more specific.

kenrap avatar Oct 02 '24 08:10 kenrap

But I doubt that it will help

And you're right, it didn't.

After trying a couple more ideas, I spent a good amount of time re-learning how to create a new core dump of the kernel panic with the firmware(s) loaded. Sorry for the delay.

core.txt.1 dump

kenrap avatar Oct 02 '24 10:10 kenrap

After taking a look at the code around faulted line, I have got an impression that it can happen due to missing vmap_pfn() implementation. It is rather easy to check. Just with replacing of return NULL; line in i915_gem_object_map_pfn() function located in drivers/gpu/drm/i915/gem/i915_gem_pages.c file of drm-kmod with panic("oops"); or return ERR_PTR(-ENOSUP);

wulf7 avatar Oct 02 '24 11:10 wulf7

It seems you're correct about the missing vmap_pfn() implementation. It triggered an "oops" panic by using the panic("oops"); based on your instructions.

What I did was created this custom patch to put into my /usr/ports/graphics/drm-66-kmod/files:

--- drivers/gpu/drm/i915/gem/i915_gem_pages.c.orig
+++ drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -329,7 +329,7 @@ static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
 {
 #ifdef __FreeBSD__
        // BSDFIXME: Need vmap_pfn() implementation.
-       return NULL;
+       panic("oops");
 #else
        resource_size_t iomap = obj->mm.region->iomap.base -
                obj->mm.region->region.start;

And rebuilt and reinstalled the package of my derived port. Then I did my usual testing and grabbed a new core dump which shows the "oops" panic. Yay! \o/

core.txt.2

kenrap avatar Oct 02 '24 12:10 kenrap

You may try following patches (only compile-tested). It is just quick conversion of vmap() implementation to vmap_pfn() through replacement struct page with page frame number FreeBSD:

Incomplete patch deleted. See patch three messages below

and drm-kmod:

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 931e7f46733..0ba955611df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -327,10 +327,6 @@ static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
 static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
 				     enum i915_map_type type)
 {
-#ifdef __FreeBSD__
-	// BSDFIXME: Need vmap_pfn() implementation.
-	return NULL;
-#else
 	resource_size_t iomap = obj->mm.region->iomap.base -
 		obj->mm.region->region.start;
 	unsigned long n_pfn = obj->base.size >> PAGE_SHIFT;
@@ -356,7 +352,6 @@ static void *i915_gem_object_map_pfn(struct drm_i915_gem_object *obj,
 		kvfree(pfns);
 
 	return vaddr ?: ERR_PTR(-ENOMEM);
-#endif
 }
 
 /* get, pin, and map the pages of the object into kernel space */

wulf7 avatar Nov 16 '24 11:11 wulf7

@wulf7 core.txt.0

kenrap avatar Nov 16 '24 15:11 kenrap

I guess I should note that after doing a lot of experimentation, there was one time I was "lucky" enough at random to get i915kms.ko to load with my Intel Arc but I was noticing this warning at the time:

drmn0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.

But that only happened once and it wasn't reproducible.

Also, I don't know if I need to get another coredump by using the following in /boot/loader.conf:

hw.i915kms.enable_guc="2"
compat.linuxkpi.i915_disable_power_well="0"

because I changed the first tunable to 1 and commented out the second one. I needed both as above to allow myself to load i915kms on my integrated Intel GPU and get back into X11.

But right now, I need some rest, because I was doing on like +20 kernel-panic reboots of experiments.

kenrap avatar Nov 16 '24 15:11 kenrap

Next version of FreeBSD patch:

diff --git a/sys/compat/linuxkpi/common/include/linux/vmalloc.h b/sys/compat/linuxkpi/common/include/linux/vmalloc.h
index 00650a2df9b..30f7e0e6297 100644
--- a/sys/compat/linuxkpi/common/include/linux/vmalloc.h
+++ b/sys/compat/linuxkpi/common/include/linux/vmalloc.h
@@ -35,8 +35,11 @@
 #define	VM_MAP		0x0000
 #define	PAGE_KERNEL	0x0000
 
+#define	vmap_pfn(...)	lkpi_vmap_pfn(__VA_ARGS__)
+
 void *vmap(struct page **pages, unsigned int count, unsigned long flags,
     int prot);
+void *lkpi_vmap_pfn(unsigned long *pfns, unsigned int count, int prot);
 void vunmap(void *addr);
 
 #endif	/* _LINUXKPI_LINUX_VMALLOC_H_ */
diff --git a/sys/compat/linuxkpi/common/src/linux_compat.c b/sys/compat/linuxkpi/common/src/linux_compat.c
index 81d24603d1d..bce3af61516 100644
--- a/sys/compat/linuxkpi/common/src/linux_compat.c
+++ b/sys/compat/linuxkpi/common/src/linux_compat.c
@@ -60,6 +60,9 @@
 #include <vm/vm_page.h>
 #include <vm/vm_pager.h>
 
+#include <vm/uma.h>
+#include <vm/uma_int.h>
+
 #include <machine/stdarg.h>
 
 #if defined(__i386__) || defined(__amd64__)
@@ -1804,6 +1807,24 @@ vmmap_remove(void *addr)
 	return (vmmap);
 }
 
+int
+is_vmalloc_addr(const void *addr)
+{
+	struct vmmap *vmmap;
+	uintptr_t p = (uintptr_t)addr;
+
+	mtx_lock(&vmmaplock);
+	LIST_FOREACH(vmmap, &vmmaphead[VM_HASH(addr)], vm_next)
+		if (p >= trunc_page(vmmap->vm_addr) &&
+		    p < round_page((char *)vmmap->vm_addr + vmmap->vm_size))
+			break;
+	mtx_unlock(&vmmaplock);
+	if (vmmap != NULL)
+		return(1);
+
+	return (vtoslab((vm_offset_t)addr & ~UMA_SLAB_MASK) != NULL);
+}
+
 #if defined(__i386__) || defined(__amd64__) || defined(__powerpc__) || defined(__aarch64__) || defined(__riscv)
 void *
 _ioremap_attr(vm_paddr_t phys_addr, unsigned long size, int attr)
@@ -1849,6 +1870,58 @@ vmap(struct page **pages, unsigned int count, unsigned long flags, int prot)
 	return ((void *)off);
 }
 
+#ifdef __amd64__
+static void
+_lkpi_pmap_qenter_pfn(vm_offset_t sva, vm_pindex_t *pi, int count,
+    vm_memattr_t mode)
+{
+	pt_entry_t *endpte, oldpte, pa, *pte;
+	vm_pindex_t p;
+	int cache_bits;
+	pt_entry_t pg_g;
+
+	pg_g = pti ? 0 : X86_PG_G;
+	oldpte = 0;
+	pte = vtopte(sva);
+	endpte = pte + count;
+	cache_bits = pmap_cache_bits(kernel_pmap, mode, false);
+	while (pte < endpte) {
+		p = *pi++;
+		pa = IDX_TO_OFF(p) | cache_bits;
+		if ((*pte & (PG_FRAME | X86_PG_PTE_CACHE)) != pa) {
+			oldpte |= *pte;
+			pte_store(pte, pa | pg_g | pg_nx | X86_PG_A |
+			    X86_PG_M | X86_PG_RW | X86_PG_V);
+		}
+		pte++;
+	}
+	if (__predict_false((oldpte & X86_PG_V) != 0))
+		pmap_invalidate_range(kernel_pmap, sva, sva + count *
+		    PAGE_SIZE);
+}
+#endif
+
+void *
+lkpi_vmap_pfn(unsigned long *pfns, unsigned int count, int prot)
+{
+#ifdef __amd64__
+	vm_offset_t off;
+	size_t size;
+
+	size = count * PAGE_SIZE;
+	off = kva_alloc(size);
+	if (off == 0)
+		return (NULL);
+	vmmap_add((void *)off, size);
+	_lkpi_pmap_qenter_pfn(off, pfns, count, pgprot2cachemode(prot));
+
+	return ((void *)off);
+#else
+	panic("vmap_pfn is not implemented");
+	return (NULL);
+#endif
+}
+
 void
 vunmap(void *addr)
 {
diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c
index 25243382f9e..1ed8b99cdf3 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -53,9 +53,6 @@
 #include <vm/vm_reserv.h>
 #include <vm/vm_extern.h>
 
-#include <vm/uma.h>
-#include <vm/uma_int.h>
-
 #include <linux/gfp.h>
 #include <linux/mm.h>
 #include <linux/preempt.h>
@@ -287,12 +284,6 @@ lkpi_get_user_pages(unsigned long start, unsigned long nr_pages,
 	    !!(gup_flags & FOLL_WRITE), pages));
 }
 
-int
-is_vmalloc_addr(const void *addr)
-{
-	return (vtoslab((vm_offset_t)addr & ~UMA_SLAB_MASK) != NULL);
-}
-
 vm_fault_t
 lkpi_vmf_insert_pfn_prot_locked(struct vm_area_struct *vma, unsigned long addr,
     unsigned long pfn, pgprot_t prot)

wulf7 avatar Nov 17 '24 06:11 wulf7

Will get on it, thanks! 😎

kenrap avatar Nov 17 '24 06:11 kenrap

@wulf7 core.txt.1

kenrap avatar Nov 17 '24 07:11 kenrap

It is strange. This panic looks like no patches have peen applied. Did you apply drm-kmod patch?

wulf7 avatar Nov 18 '24 05:11 wulf7

I still kept that same patch in the files directory of my drm-kmod port like last time. But I rebuilt the package in an updated poudriere jail using a non-clean build with your latest patch.

I might want to try this over again with a clean FreeBSD build and redo my steps with more care.

kenrap avatar Nov 18 '24 05:11 kenrap

Actually, you were spot on. I was moving the patch around outside of the files directory and it didn't get applied for the second round. My apologies.

Gonna do a clean build anyway. 😄

kenrap avatar Nov 18 '24 05:11 kenrap

@wulf7 core.txt.2

This time with interesting GPU HANG output.

kenrap avatar Nov 18 '24 06:11 kenrap

Firmware loaded successfully this time.

But unfortunately I have no idea how to debug GPU hangs.

wulf7 avatar Nov 18 '24 06:11 wulf7

Understood. And I appreciate all of your help here regardless. Thanks for working on those patches.

kenrap avatar Nov 18 '24 06:11 kenrap

Okay, I got good news and bad news.

Good news: I got past the GPU hangs and successfully loaded the i915kms driver in tty with just the following in /boot/loader.conf:

hw.i915kms.modeset="1"

Bad news: X11 never starts and it's likely because of the following errors:

Nov 18 06:41:56 freebsd kernel: drmn0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
Nov 18 06:41:56 freebsd kernel: drmn0: [drm] *ERROR* GT0: Enabling uc failed (-5)
Nov 18 06:41:56 freebsd kernel: drmn0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Nov 18 06:41:56 freebsd kernel: drmn0: [drm:0xffffffff83f34660s] 0xfffffe0229493808Vsysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!

It's crucial to be able to use hw.i915kms.enable_guc="2" for the Intel Arc, but sadly with it enabled (even with the "1" and "3" values) it always kernel panics. So I'm in a catch-22 here.

kenrap avatar Nov 18 '24 14:11 kenrap

GUC/HUC is not supported by drm-kmod on DG2 yet. It requires porting of MEI and PXP drivers

wulf7 avatar Nov 18 '24 15:11 wulf7

Ah, yeah, that's right. Thanks for the reminder there.

kenrap avatar Nov 18 '24 15:11 kenrap

Hi!

Any advance here?

I have a A380 in 15.0-CURRENT with drm-66-kmod, and i915kms hangs on load.

Update: The current gpu-firmware-kmod must be updated to include the latest firmware versions for Intel Arc, in this case version dg2_dmc_ver2_08.bin, otherwise the firmware loading fails.

The rest of the behavior remains the same, with a hang when loading the i915kms module.

yukiteruamano avatar Jun 27 '25 22:06 yukiteruamano