drm-kmod icon indicating copy to clipboard operation
drm-kmod copied to clipboard

Update to Linux 6.7 drivers

Open dumbbell opened this issue 11 months ago • 31 comments

This is the backport of the DRM drivers from Linux 6.7.

Progress:

Changes in Linux 6.7

You can read this Phoronix article to learn about the changes in the DRM drivers in Linux 6.7: https://www.phoronix.com/news/Linux-6.7-DRM-Graphics-Drivers

Patches to linuxkpi

This update depends on the following patches to linuxkpi in FreeBSD.

These patches are maintained in the following repository and branch: https://github.com/dumbbell/freebsd-src/tree/drm-related-linuxkpi-changes

Patches were submitted for review:

  • [x] ~~https://reviews.freebsd.org/D48740~~
  • [x] ~~https://reviews.freebsd.org/D48741~~
  • [x] ~~https://reviews.freebsd.org/D48742~~
  • [ ] https://reviews.freebsd.org/D48743
  • [x] ~~https://reviews.freebsd.org/D48744~~
  • [x] ~~https://reviews.freebsd.org/D48745~~
  • [x] ~~https://reviews.freebsd.org/D48746~~
  • [x] ~~https://reviews.freebsd.org/D48747~~
  • [x] ~~https://reviews.freebsd.org/D48748~~
  • [x] ~~https://reviews.freebsd.org/D48749~~
  • [x] ~~https://reviews.freebsd.org/D48750~~
  • [x] ~~https://reviews.freebsd.org/D48751~~
  • [x] ~~https://reviews.freebsd.org/D48752~~
  • [x] ~~https://reviews.freebsd.org/D48753~~
  • [x] ~~https://reviews.freebsd.org/D48754~~
  • [ ] https://reviews.freebsd.org/D48755
  • [x] ~~https://reviews.freebsd.org/D48756~~
  • [x] ~~https://reviews.freebsd.org/D48757~~
  • [x] ~~https://reviews.freebsd.org/D48758~~
  • [x] ~~https://reviews.freebsd.org/D48759~~
  • [x] ~~https://reviews.freebsd.org/D48760~~
  • [x] ~~https://reviews.freebsd.org/D48761~~
  • [x] ~~https://reviews.freebsd.org/D48762~~
  • [x] ~~https://reviews.freebsd.org/D48860~~
  • [x] ~~https://reviews.freebsd.org/D48861~~
  • [x] ~~https://reviews.freebsd.org/D48862~~

Firmware updates

There is an associated firmware update:

  • [ ] freebsd/drm-kmod-firmware#36

How to test

You need to run a recent FreeBSD 15-CURRENT to test it.

Here are some instructions:

  1. You need to checkout the FreeBSD src branch I mentionned, drm-related-linuxkpi-changes, and compile a kernel from that branch:

    git clone -b drm-related-linuxkpi-changes https://github.com/dumbbell/freebsd-src.git
    cd freebsd-src
    make -j8 buildkernel DEBUG_FLAGS=-g
    
    # This installs the kernel under another name, `kernel.drm`. Thus, you keep the default kernel
    # in case of trouble.
    sudo make installkernel DEBUG_FLAGS=-g INSTKERNNAME=kernel.drm
    
  2. You need to checkout the branch referenced in this pull request and compile it:

    git clone -b update-to-linux-6.7 https://github.com/dumbbell/drm-kmod.git
    cd drm-kmod
    make -j8 DEBUG_FLAGS=-g SYSDIR=/path/to/freebsd-src-from-step1/sys
    sudo make install DEBUG_FLAGS=-g SYSDIR=/path/to/freebsd-src-from-step1/sys KMODDIR=/boot/kernel.drm
    
  3. Load the relevant driver(s) as you usually do.

dumbbell avatar Jan 01 '25 16:01 dumbbell

Does this mean I can use Intel external GPU? Thank you, as always, for improving the DRM stack.

lin72h avatar Jan 01 '25 21:01 lin72h

I have no idea :-) Do you have a unit to test with?

I only test with an Intel 12th gen iGPU and a Radeon RX 6700 XT dGPU.

dumbbell avatar Jan 01 '25 21:01 dumbbell

@dumbbell Yes, I've ordered an A310 and a DG1 card. They should arrive in a few days. I'll test them and give you some feedback then.

lin72h avatar Jan 02 '25 06:01 lin72h

@wulf7 Would these updates add GUC/HUC support for DG2 cards?

kenrap avatar Jan 02 '25 09:01 kenrap

@wulf7 Would these updates add GUC/HUC support for DG2 cards?

Intel MEI and PXP drivers are still not ported

wulf7 avatar Jan 02 '25 16:01 wulf7

Does this mean I can use Intel external GPU? Thank you, as always, for improving the DRM stack.

Probably vmmap_pfn() implementation is required. You may try patch from https://github.com/freebsd/drm-kmod/issues/315. Unfortunatelly it didn't help for #315.

wulf7 avatar Jan 02 '25 16:01 wulf7

On a Dell Raptor Lake system the driver loads and I can start X but video is corrupted: image Zoomed in: image

emaste avatar Jan 09 '25 22:01 emaste

Maybe we (still) need this? We have it in 6.6, but not in this PR.

intel_color.c.txt

lutzbichler avatar Jan 11 '25 14:01 lutzbichler

We don’t have this patch in the master branch (which is at 6.6). Where does this patch come from?

dumbbell avatar Jan 12 '25 19:01 dumbbell

Maybe we (still) need this? We have it in 6.6, but not in this PR.

intel_color.c.txt

Yes that fixes it

emaste avatar Jan 12 '25 20:01 emaste

We don’t have this patch in the master branch (which is at 6.6). Where does this patch come from?

It is probably not the real fix to the problem, but the removal of:

/* FIXME DSB has issues loading LUTs, disable it for now */
return;

in intel_color_prepare_commit (737e4a72e41c4c3a7e080d729b8286ef9258dcaf) is where I landed when I bisected (my attempt) to update to 6.7. So it seems the fix for Linux alone does not fix it for FreeBSD.

lutzbichler avatar Jan 12 '25 20:01 lutzbichler

The FIXME comes from f327ba214b3 and was removed in 737e4a72e41c4c3a7e080d729b8286ef9258dcaf

emaste avatar Jan 12 '25 22:01 emaste

Panic on Framework Ultra 1 https://termbin.com/snmo

CPU: Intel(R) Core(TM) Ultra 5 125H (2995.20-MHz K8-class CPU)

drmn0: __drm_fb_helper_find_sizes: test CRTC 0 primary plane
drmn0: intelfb_create: no BIOS fb, allocating a new one
panic: Assertion base != 0 failed at /home/user/freebsd/sys/modules/drm-kmod/drivers/gpu/drm/drm_os_freebsd.c:86
cpuid = 6
time = 1736772151
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010ac92490
vpanic() at vpanic+0x136/frame 0xfffffe010ac925c0
panic() at panic+0x43/frame 0xfffffe010ac92620
register_fictitious_range() at register_fictitious_range+0xc9/frame 0xfffffe010ac92650
intelfb_create() at intelfb_create+0x542/frame 0xfffffe010ac92760
__drm_fb_helper_initial_config_and_unlock() at __drm_fb_helper_initial_config_and_unlock+0x5b2/frame 0xfffffe010ac92800
intel_fbdev_initial_config_async() at intel_fbdev_initial_config_async+0x1a/frame 0xfffffe010ac92820
intel_display_driver_register() at intel_display_driver_register+0x63/frame 0xfffffe010ac92860
i915_driver_register() at i915_driver_register+0x65/frame 0xfffffe010ac92880
i915_driver_probe() at i915_driver_probe+0xb06/frame 0xfffffe010ac928c0
linux_pci_attach_device() at linux_pci_attach_device+0x43f/frame 0xfffffe010ac92910
device_attach() at device_attach+0x42b/frame 0xfffffe010ac92960
bus_generic_driver_added() at bus_generic_driver_added+0xa0/frame 0xfffffe010ac92980
devclass_driver_added() at devclass_driver_added+0x2f/frame 0xfffffe010ac929b0
devclass_add_driver() at devclass_add_driver+0x138/frame 0xfffffe010ac929f0
_linux_pci_register_driver() at _linux_pci_register_driver+0xc1/frame 0xfffffe010ac92a20
i915kms_evh() at i915kms_evh+0x28e/frame 0xfffffe010ac92a50
module_register_init() at module_register_init+0xb6/frame 0xfffffe010ac92a80
linker_load_module() at linker_load_module+0xc79/frame 0xfffffe010ac92d80
kern_kldload() at kern_kldload+0x16e/frame 0xfffffe010ac92dd0
sys_kldload() at sys_kldload+0x59/frame 0xfffffe010ac92e00
amd64_syscall() at amd64_syscall+0x163/frame 0xfffffe010ac92f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010ac92f30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x2acd570fbb9a, rsp = 0x2acd5417b6f8, rbp = 0x2acd5417bc70 ---

(For me the 6.6 port did not work on this machine, so no info on whether this is a regression.)

emaste avatar Jan 13 '25 14:01 emaste

Isn´t this the same as https://github.com/freebsd/drm-kmod/pull/324?

lutzbichler avatar Jan 13 '25 15:01 lutzbichler

Isn´t this the same as https://github.com/freebsd/drm-kmod/pull/324?

It looks like it, yes - will test shortly.

emaste avatar Jan 13 '25 15:01 emaste

We don’t have this patch in the master branch (which is at 6.6). Where does this patch come from?

It is probably not the real fix to the problem, but the removal of:

/* FIXME DSB has issues loading LUTs, disable it for now */
return;

Ok, I see. I was looking for a FreeBSD-specific change and missed the fact that it was an upstream change. Thanks!

dumbbell avatar Jan 13 '25 17:01 dumbbell

Isn´t this the same as https://github.com/freebsd/drm-kmod/pull/324?

Yes, it is the same issue; I've left a comment there. We need to implement some more functionality for Meteor Lake (MTL); I wonder if it would be possible to use the upstream routines more directly (other than drm-kmod being in ports).

emaste avatar Jan 13 '25 19:01 emaste

I resolved any diff with Linux 6.7, all committed in a single commit at the end of the branch. The associated freebsd-src branch was updated as the same time.

I will continue to use the branch as my daily driver with the amdgpu driver.

I will take some time to test the i915 driver too, it’s just that it is impractical: wifi does not work for me (Framework 13, Intel 12th gen) and with the laptop plugged into the dock, I break my neck to see the laptop’s screen :-)

dumbbell avatar Jan 13 '25 23:01 dumbbell

Yes the patch in #324 avoids that panic, at the cost of display corruption while in the console (as mentioned there).

When trying to start X I get another panic:

...
WARNING: Device driver ttydev has set "memattr" inconsistently (drv 1 pmap 6).
WARNING: Device driver ttydev has set "memattr" inconsistently (drv 1 pmap 6).
WARNING: Device driver ttydev has set "memattr" inconsistently (drv 1 pmap 6).
WARNING: Device driver ttydev has set "memattr" inconsistently (drv 1 pmap 6).
WARNING: Device driver ttydev has set "memattr" inconsistently (drv 1 pmap 6).
panic: smr_entered_load: Assertion SMR_ENTERED((smr)) failed at /home/user/freebsd/sys/kern/subr_pctrie.c:146
cpuid = 3
time = 1736787411
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0115fcaa70
vpanic() at vpanic+0x136/frame 0xfffffe0115fcaba0
panic() at panic+0x43/frame 0xfffffe0115fcac00
pctrie_lookup_unlocked() at pctrie_lookup_unlocked+0x12d/frame 0xfffffe0115fcac20
vm_page_lookup_unlocked() at vm_page_lookup_unlocked+0x14/frame 0xfffffe0115fcac30
vm_fault() at vm_fault+0xbdc/frame 0xfffffe0115fcad60
vm_fault_trap() at vm_fault_trap+0x65/frame 0xfffffe0115fcada0
trap_pfault() at trap_pfault+0x27b/frame 0xfffffe0115fcae10
trap() at trap+0x51e/frame 0xfffffe0115fcaf30
calltrap() at calltrap+0x8/frame 0xfffffe0115fcaf30
--- trap 0xc, rip = 0x8280b475a, rsp = 0x82128b838, rbp = 0x82128b8d0 ---
Uptime: 1m29s

emaste avatar Jan 14 '25 18:01 emaste

This PR works for me as well on meteorlake, although I need the intel_color fix and the fix from #324 to do so.

Unfortunately it seems nothing can actually use the GPU because mesa can't query some device info bit from i915. I was hoping commit ada29e8bdc1f would improve things but it doesn't seem to. The error I get is:

% vkcube
MESA: warning: Could not get intel_device_info

I've been seeing this on the current master 6.6 update as well, so this isn't introduced by this PR but still figured I would mention it. Not sure if others see this working. This prevents me from testing anything real, sway, vkcube, etc all fail. It's something to do with Mesa's intel_get_device_info_from_fd() (eventually plumbing down to DRM_I915_QUERY_GEOMETRY_SUBSLICES iirc) not properly identifying the engines or something.

amshafer avatar Jan 16 '25 04:01 amshafer

So, before updating drm-kmod more we should probably create a 6.6-lts branch, and for that we have to check if someone have the time to finish porting part of the i915 driver that's missing (don't remember which component are missing currently but @wulf7 knows).

evadot avatar Jan 22 '25 07:01 evadot

Last night, I pushed two changes:

  1. the fact that the GuC in the i915 driver is enabled if supported; this relies on a patch to linuxkpi, so be sure to update your kernel using my branch as well (see pull request description)
  2. an attempt to fix the registered fictitious range

Please give the latest commits a shot and report back :-)

finish porting part of the i915 driver that's missing

Oh, I wasn’t aware part of the driver was missing compared to earlier drm-kmod versions.

dumbbell avatar Jan 22 '25 09:01 dumbbell

@evadot I don't see why there's a dependency as you imply -- we can create a 6.6 branch from master now, and any further work that's required can happen on the branch. 6.7 work can continue on master.

FWIW I have been running 6.6 on my daily driver Framework 11th gen Intel laptop for quite some time.

@dumbbell I brought the two changes into my branch and tried on two machines. Looks fine on 13-th gen Intel i5-1335U.

On the Framework Core Ultra 5 125H the panic is addressed but the video corruption in vt remains (as with #324): image

Still panic upon starting X as mentioned above: image

emaste avatar Jan 22 '25 13:01 emaste

@evadot I don't see why there's a dependency as you imply -- we can create a 6.6 branch from master now, and any further work that's required can happen on the branch. 6.7 work can continue on master.

Because if we don't introduce IME and PXP in master before branching 6.6-lts I'm 100% sure that we will not have it in it. So the question is do we want to move forward now and introduce IME and PXP later or not.

evadot avatar Jan 22 '25 15:01 evadot

IMO we cannot afford to wait on moving forward on support for contemporary hardware.

Nothing fundamentally prevents us from working on IME and PXP in master and merging work to 6.6-lts later on, does it?

emaste avatar Jan 22 '25 15:01 emaste

Unfortunately both of these new commits seem to have issues for me. With the FB fix my machine seems to hang with a black screen now, reverting that and using my change from #324 resolves that. Unfortunately because it hangs I can't get a panic and see where it is actually going wrong. I'll keep playing with it and see if I can get any hints. If I had to guess I would guess that it does not fail during register_fictitious_range but fails after that due to something about the framebuffer, since the entire screen turns black.

The GuC enablement seems to fail:

Jan 23 00:54:34 token kernel: drmn1: [drm] GPU HANG: ecode 12:0:00000000
Jan 23 00:54:40 token syslogd: last message repeated 1 times
Jan 23 00:54:41 token kernel: drmn1: [drm] Got hung context on rcs'0 with active request 9:2 [0x1003] not yet started
Jan 23 00:54:41 token kernel: drmn1: [drm] GPU HANG: ecode 12:0:00000000

Here's the full log with DRM debugging fully enabled: guc_fail.txt

It continued on a little more after this trying to reset the GPU since it saw bcs was hung as well.

kldload i915kms is stuck in the following stack while the hang shows in dmesg:

mi_switch+0x170 sleepq_switch+0x101 linux_add_to_sleepqueue+0xb2 linux_schedule_timeout+0x7b i915_request_wait_timeout+0x254 i915_request_wait+0x23 gen8_ggtt_bind_ptes+0x551 __gen8_ggtt_insert_entries_bind+0x118 gen8_ggtt_insert_entries_bind+0x54 intel_ggtt_bind_vma+0xb4 i915_vma_bind+0x348 i915_vma_pin_ww+0x4e3 __i915_ggtt_pin+0x57 i915_ggtt_pin+0x51 __context_pin_state+0x3c intel_context_pre_pin+0xb1 __intel_context_do_pin_ww+0x123 intel_context_pin_ww+0x47 

I am using the latest drm-related-linuxkpi-changes branch with the xa_destroy fix.

amshafer avatar Jan 23 '25 06:01 amshafer

Thank you @amshafer for the report! I will try to process that tonight.

Meanwhile, I worked last night on a fix for the following incorrect log messages in i915:

Jan 22 20:19:36 iss kernel: i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe01756967e0V<7>i915 display info: 0xfffffe0175696820V

Now it displays the correct messages:

Jan 23 10:06:57 iss kernel: i915 display info: display version: 13
Jan 23 10:06:57 iss kernel: i915 display info: cursor_needs_physical: no
Jan 23 10:06:57 iss kernel: i915 display info: has_cdclk_crawl: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_cdclk_squash: no
Jan 23 10:06:57 iss kernel: i915 display info: has_ddi: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_dp_mst: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_dsb: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_fpga_dbg: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_gmch: no
Jan 23 10:06:57 iss kernel: i915 display info: has_hotplug: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_hti: no
Jan 23 10:06:57 iss kernel: i915 display info: has_ipc: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_overlay: no
Jan 23 10:06:57 iss kernel: i915 display info: has_psr: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_psr_hw_tracking: no
Jan 23 10:06:57 iss kernel: i915 display info: overlay_needs_physical: no
Jan 23 10:06:57 iss kernel: i915 display info: supports_tv: no
Jan 23 10:06:57 iss kernel: i915 display info: has_hdcp: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_dmc: yes
Jan 23 10:06:57 iss kernel: i915 display info: has_dsc: yes

This a change in freebsd-src but it’s not committed and pushed yet.

dumbbell avatar Jan 23 '25 10:01 dumbbell

@amshafer: Could you please try:

  • to log the values passed to register_fictituous_range() and comment out that call to let the driver initialize? I’m surprised that vt(4) works if these values are zero/garbage as this is the address of the video buffer.
  • to disable GuC with hw.i915kms.enable_guc=0 in /boot/loader.conf, to eliminate another variable; let’s focus on the console

Do you also have the intel_color.c patch applied too? You can keep it too for now.

dumbbell avatar Jan 23 '25 18:01 dumbbell

This a change in freebsd-src but it’s not committed and pushed yet.

I just committed that change to the linuxkpi-updates-for-drm branch. It implements the %pV format string conversion spec.

It is a breaking change. I grepped freebsd-src and the only format strings that happen to have this %pV sequence are drivers that come from Linux and have a struct va_format argument (though this code may not be connected to the build).

This implementation follows the behavior of Linux and linuxkpi-based drivers don’t need any modifications. Another approach could be to swap the letters, e.g. %Vp to have V as a modifier of %p.

dumbbell avatar Jan 23 '25 23:01 dumbbell

I do see a panic message of:

Unread portion of the kernel message buffer:
PHY A: 0x00202 AUX -> (ret=  6) fffffe01f5d6df16h
drmn1: intel_dp_link_training_channel_equalization: [CONNECTOR:236:eDP-1][ENCODER:235:DDI A/PHY A][DPRX] Channel EQ done. DP Training successful
drmn1: intel_dp_link_train_phy: [CONNECTOR:236:eDP-1][ENCODER:235:DDI A/PHY A][DPRX] Link Training passed at link rate = 540000, lane count = 4
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00102 AUX <- (ret=  1) fffffe01f5d6dfefh
drmn1: intel_enable_transcoder: enabling pipe A
drmn1: drm_crtc_vblank_on: crtc 0, vblank enabled 0, inmodeset 1
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,-32)@ 58.799060 -> 58.799279 [e 0 us, 0 rep]
drmn1: intel_edp_backlight_on: 
drmn1: intel_backlight_enable: pipe A
drmn1: intel_dp_wait_source_oui: [CONNECTOR:236:eDP-1] Performing OUI wait (0 ms)
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00000 AUX -> (ret=  1) fffffe01f5d6dee3h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00344 AUX -> (ret=  1) fffffe01f5d6df83h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00354 AUX <- (ret=  4) fffffe01f5d6df24h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00344 AUX <- (ret=  1) fffffe01f5d6df53h
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,550)@ 58.803060 -> 58.799290 [e 0 us, 0 rep]
drmn1: drm_vblank_restore: missed 0 vblanks in 11514 ns, frame duration=16666561 ns, hw_diff=0
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,555)@ 58.803060 -> 58.799256 [e 0 us, 0 rep]
drmn1: drm_update_vblank_count: updating vblank count on crtc 0: current=4, diff=0, hw=4425 hw_last=4425
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,570)@ 58.803060 -> 58.799153 [e 0 us, 0 rep]
drmn1: drm_update_vblank_count: updating vblank count on crtc 0: current=4, diff=0, hw=4425 hw_last=4425
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,-30)@ 58.815865 -> 58.816071 [e 0 us, 0 rep]
drmn1: drm_update_vblank_count: updating vblank count on crtc 0: current=4, diff=1, hw=4426 hw_last=4425
drmn1: gen9_dbuf_slices_update: Updating dbuf slices to 0xf
drmn1: intel_connector_verify_state: [CONNECTOR:236:eDP-1]
drmn1: verify_crtc_state: [CRTC:80:pipe A]
drmn1: intel_edp_fixup_vbt_bpp: pipe has 30 bpp for eDP panel, overriding BIOS-provided max 24 bpp
drmn1: intel_fbdev_init_bios: drmn1: [PLANE:31:plane 1A] no fb, skipping
drm_atomic_state_default_clear: drmn1: intel_fbdev_init_bios: [CRTC:131:pipe B] not active, skipping
Clearing atomic state 0xfffff80004c67000
drmn1: intel_fbdev_init_bios: [CRTC:182:pipe C] not active, skipping
drmn1: intel_fbdev_init_bios: [CRTC:233:pipe D] not active, skipping
drmn1: intel_fbdev_init_bios: no active fbs found, not using BIOS config
drmn1: i915_gem_open: 
drmn1: __drm_atomic_state_free: Freeing atomic state 0xfffff80004c67000
sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!
drmn1: intel_backlight_device_register: [CONNECTOR:236:eDP-1] backlight device intel_backlight registered
drmn1: intel_dp_connector_register: registering AUX A/DDI A/PHY A bus for card0-eDP-1
lkpi_iic9: <LinuxKPI I2C> on drm1
iicbus11: <Philips I2C bus> on lkpi_iic9
iic11: <I2C generic I/O> on iicbus11
drmn1: drm_sysfs_connector_hotplug_event: [CONNECTOR:236:eDP-1] generating connector hotplug event
drmn1: intel_dp_connector_register: registering AUX USBC1/DDI TC1/PHY TC1 bus for card0-DP-1
lkpi_iic10: <LinuxKPI I2C> on drm2
iicbus12: <Philips I2C bus> on lkpi_iic10
iic12: <I2C generic I/O> on iicbus12
drmn1: drm_sysfs_connector_hotplug_event: [CONNECTOR:245:DP-1] generating connector hotplug event
drmn1: intel_dp_connector_register: registering AUX USBC2/DDI TC2/PHY TC2 bus for card0-DP-2
lkpi_iic11: <LinuxKPI I2C> on drm3
iicbus13: <Philips I2C bus> on lkpi_iic11
iic13: <I2C generic I/O> on iicbus13
drmn1: drm_sysfs_connector_hotplug_event: [CONNECTOR:258:DP-2] generating connector hotplug event
drmn1: drm_sysfs_connector_hotplug_event: [CONNECTOR:267:HDMI-A-1] generating connector hotplug event
drmn1: intel_dp_connector_register: registering AUX USBC4/DDI TC4/PHY TC4 bus for card0-DP-3
lkpi_iic12: <LinuxKPI I2C> on drm5
iicbus14: <Philips I2C bus> on lkpi_iic12
iic14: <I2C generic I/O> on iicbus14
drmn1: drm_sysfs_connector_hotplug_event: [CONNECTOR:273:DP-3] generating connector hotplug event
<6>[drm] Initialized i915 1.6.0 20230929 for drmn1 on minor 0
drmn1: intel_didl_outputs: 5 outputs detected
[drm] 
[drm] [CONNECTOR:236:eDP-1]
drmn1: intel_dp_detect: [CONNECTOR:236:eDP-1]
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00000 AUX -> (ret=  1) fffffe01f5d6e313h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00060 AUX -> (ret= 16) fffff808d19fb738h
drmn1: intel_dp_read_dsc_dpcd: DSC DPCD: fffff808d19fb738h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00000 AUX -> (ret=  1) fffffe01f5d6e2b3h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00021 AUX -> (ret=  1) fffffe01f5d6e347h
drmn1: intel_dp_configure_mst: [ENCODER:235:DDI A/PHY A] MST support: port: no, sink: no, modparam: yes
drmn1: intel_dp_print_rates: source rates: 162000, 216000, 243000, 270000, 324000, 432000, 540000, 675000, 810000
drmn1: intel_dp_print_rates: sink rates: 162000, 270000, 540000
drmn1: intel_dp_print_rates: common rates: 162000, 270000, 540000
drmn1: update_display_info: [CONNECTOR:236:eDP-1] Assigning EDID-1.4 digital sink color depth as 10 bpc.
drmn1: drm_edid_to_eld: [CONNECTOR:236:eDP-1] ELD monitor 
drmn1: drm_edid_to_eld: [CONNECTOR:236:eDP-1] ELD size 20, SAD count 0
drmn1: intel_dp_set_edid: [CONNECTOR:236:eDP-1] VRR capable: no
drmn1: intel_dp_update_dfp: [CONNECTOR:236:eDP-1] DFP max bpc 0, max dotclock 0, TMDS clock 0-0, PCON Max FRL BW 0Gbps
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00000 AUX -> (ret=  1) fffffe01f5d6e2a3h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00092 AUX -> (ret= 13) fffff808d19fa1c5h
drmn1: intel_dp_get_pcon_dsc_cap: PCON ENCODER DSC DPCD: fffff808d19fa1c5h
drmn1: intel_dp_update_420: [CONNECTOR:236:eDP-1] RGB->YcbCr conversion? no, YCbCr 4:2:0 allowed? yes, YCbCr 4:4:4->4:2:0 conversion? no
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00000 AUX -> (ret=  1) fffffe01f5d6e333h
drmn1: drm_dp_dump_access: AUX A/DDI A/PHY A: 0x00201 AUX -> (ret=  1) fffffe01f5d6e3cfh
[drm] [CONNECTOR:236:eDP-1] status updated from unknown to connected
[drm] [CONNECTOR:236:eDP-1] probed modes :
[drm] Modeline "3840x2400": 60 572010 3840 3872 3880 3920 2400 2408 2416 2432 0x48 0x9
[drm] [CONNECTOR:245:DP-1]
drmn1: intel_dp_detect: [CONNECTOR:245:DP-1]
[drm] [CONNECTOR:245:DP-1] status updated from unknown to disconnected
[drm] [CONNECTOR:245:DP-1] disconnected
[drm] [CONNECTOR:258:DP-2]
drmn1: intel_dp_detect: [CONNECTOR:258:DP-2]
[drm] [CONNECTOR:258:DP-2] status updated from unknown to disconnected
[drm] [CONNECTOR:258:DP-2] disconnected
[drm] [CONNECTOR:267:HDMI-A-1]
drmn1: intel_hdmi_detect: [CONNECTOR:267:HDMI-A-1]
[drm] [CONNECTOR:267:HDMI-A-1] status updated from unknown to disconnected
[drm] [CONNECTOR:267:HDMI-A-1] disconnected
[drm] [CONNECTOR:273:DP-3]
drmn1: intel_dp_detect: [CONNECTOR:273:DP-3]
[drm] [CONNECTOR:273:DP-3] status updated from unknown to disconnected
[drm] [CONNECTOR:273:DP-3] disconnected
[drm] connector 236 enabled? yes
[drm] connector 245 enabled? no
[drm] connector 258 enabled? no
[drm] connector 267 enabled? no
[drm] connector 273 enabled? no
[drm] Not using firmware configuration
[drm] looking for cmdline mode on connector 236
[drm] looking for preferred mode on connector 236 0
[drm] found mode 3840x2400
[drm] picking CRTCs for 16384x16384 config
[drm] desired mode 3840x2400 set on crtc 80 (0,0)
drmn1: __drm_fb_helper_find_sizes: test CRTC 0 primary plane
drmn1: intelfb_create: no BIOS fb, allocating a new one
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,-30)@ 58.831997 -> 58.832202 [e 0 us, 0 rep]
drmn1: drm_update_vblank_count: updating vblank count on crtc 0: current=5, diff=1, hw=4427 hw_last=4426
drmn1: drm_crtc_vblank_helper_get_vblank_timestamp_internal: crtc 0 : v p(0,-21)@ 58.831997 -> 58.832141 [e 0 us, 0 rep]
drmn1: drm_update_vblank_count: updating vblank count on crtc 0: current=6, diff=0, hw=4427 hw_last=4427
drmn1: [drm] *ERROR* smem_start = 0x40000, smem_len = 0x2328000
drmn1: intelfb_create: allocated 3840x2400 fb: 0x00040000
VT: Replacing driver "efifb" with new "drmfb".
panic: _free(0): address 0xfffff80001a4a000(0xfffff80001a4a000) has not been allocated

This line from above comes from my own logging as requested: drmn1: [drm] *ERROR* smem_start = 0x40000, smem_len = 0x2328000. I changed:

+    drm_err(&dev_priv->drm, "smem_start = 0x%x, smem_len = 0x%x\n", info->fix.smem_start, info->fix.smem_len);
+    if (info->fix.smem_len) {
        register_fictitious_range(info->fix.smem_start, info->fix.smem_len);
+    }

Backtrace is:

#0  __curthread () at /usr/freebsd-src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1) at /usr/freebsd-src/sys/kern/kern_shutdown.c:404
#2  0xffffffff80b55eb0 in kern_reboot (howto=260) at /usr/freebsd-src/sys/kern/kern_shutdown.c:524
#3  0xffffffff80b563cc in vpanic (fmt=0xffffffff812a30ef "%s(%d): address %p(%p) has not been allocated", ap=ap@entry=0xfffffe01f5d6e4d0)
    at /usr/freebsd-src/sys/kern/kern_shutdown.c:979
#4  0xffffffff80b56213 in panic (fmt=<unavailable>) at /usr/freebsd-src/sys/kern/kern_shutdown.c:892
#5  0xffffffff80b27b1f in _free (addr=0xfffff80001a4a000, mtp=0xffffffff818a9140 <M_VTBUF>, dozero=false) at /usr/freebsd-src/sys/kern/kern_malloc.c:922
#6  free (addr=0xfffff80001a4a000, mtp=0xffffffff818a9140 <M_VTBUF>) at /usr/freebsd-src/sys/kern/kern_malloc.c:967
#7  0xffffffff809971b4 in vtbuf_grow (vb=vb@entry=0xffffffff818a99f8 <vt_conswindow+16>, p=p@entry=0xfffffe01f5d6e554, history_size=2173344232)
    at /usr/freebsd-src/sys/dev/vt/vt_buf.c:660
#8  0xffffffff8099c82a in vt_change_font (vw=vw@entry=0xffffffff818a99e8 <vt_conswindow>, vf=0xffffffff81b7bc58 <vt_font_loader>) at /usr/freebsd-src/sys/dev/vt/vt_core.c:2156
#9  0xffffffff80998a5c in vt_resize (vd=0xffffffff818a9b38 <vt_consdev>) at /usr/freebsd-src/sys/dev/vt/vt_core.c:3242
#10 vt_upgrade (vd=vd@entry=0xffffffff818a9b38 <vt_consdev>) at /usr/freebsd-src/sys/dev/vt/vt_core.c:3216
#11 0xffffffff809993bf in vt_replace_backend (drv=drv@entry=0xffffffff857e28f8 <vt_drmfb_driver>, softc=softc@entry=0xfffff808d1d92d48)
    at /usr/freebsd-src/sys/dev/vt/vt_core.c:3316
#12 0xffffffff8099926c in vt_allocate (drv=0xffffffff857e28f8 <vt_drmfb_driver>, softc=0xfffff808d1d92d48) at /usr/freebsd-src/sys/dev/vt/vt_core.c:3391
#13 0xffffffff85791cdc in vt_drmfb_attach (fbio=0xfffff808d1d92d48) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/vt_drmfb.c:348
#14 0xffffffff8579081b in __register_framebuffer (fb_info=0xfffff808d1d92c00) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/linux_fb.c:227
#15 0xffffffff8579062f in linux_register_framebuffer (fb_info=0xfffff808d1d92c00) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/linux_fb.c:253
#16 0xffffffff85795cc3 in __drm_fb_helper_initial_config_and_unlock (fb_helper=0xfffff80004c7aa00) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/drm_fb_helper.c:1908
#17 0xffffffff85795b07 in drm_fb_helper_initial_config (fb_helper=0xfffff80004c7aa00) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/drm_fb_helper.c:1977
#18 0xffffffff85aff0e4 in intel_fbdev_initial_config_async (dev_priv=0xfffffe01f8b1f000) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/i915/display/intel_fbdev.c:630
#19 0xffffffff8593c692 in intel_display_driver_register (i915=0xfffffe01f8b1f000) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/i915/display/intel_display_driver.c:408
#20 0xffffffff8580844a in i915_driver_register (dev_priv=0xfffffe01f8b1f000) at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/i915/i915_driver.c:635
#21 0xffffffff8580796f in i915_driver_probe (pdev=0xfffff80001a2aa00, ent=0xffffffff85b51920 <pciidlist+11264>)
    at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/i915/i915_driver.c:817
#22 0xffffffff8581b029 in i915_pci_probe (pdev=0xfffff80001a2aa00, ent=0xffffffff85b51920 <pciidlist+11264>)
    at /usr/home/ashafer/git/drm-kmod/drivers/gpu/drm/i915/i915_pci.c:1060
#23 0xffffffff80dfdeb3 in linux_pci_attach_device (dev=<unavailable>, dev@entry=<error reading variable: value is not available>, pdrv=0xffffffff85b7b590 <i915_pci_driver>, 
    pdrv@entry=<error reading variable: value is not available>, id=0xffffffff85b51920 <pciidlist+11264>, id@entry=<error reading variable: value is not available>, 
    pdev=0xfffff80001a2aa00, pdev@entry=<error reading variable: value is not available>) at /usr/freebsd-src/sys/compat/linuxkpi/common/src/linux_pci.c:553
#24 0xffffffff80b93c2b in DEVICE_ATTACH (dev=0xfffff800041ba600) at ./device_if.h:195
#25 device_attach (dev=dev@entry=0xfffff800041ba600) at /usr/freebsd-src/sys/kern/subr_bus.c:2615
#26 0xffffffff80b95a60 in device_probe_and_attach (dev=0xfffff800041ba600, dev@entry=<error reading variable: value is not available>)

This was with hw.i915kms.enable_guc=0 set, I think this is what was causing my earlier reported issues. I did manually have to kill the machine but it still dumped, so I think this is accurate but not 100%.

Looks like some kind of freeing of unused memory. I haven't tracked down where that address comes from but given we are doing a VT resize I wonder if we never allocated the original framebuffer, go to free it, and then hit this. It seems like register_fictitious_range doesn't error out though.

amshafer avatar Jan 26 '25 22:01 amshafer