drm-kmod icon indicating copy to clipboard operation
drm-kmod copied to clipboard

`panic: mi_switch: switch in a critical section`, intel_legacy_cursor_update and intel_vblank_evade

Open emaste opened this issue 6 months ago • 9 comments

Describe the bug Reproducible panic appeared somewhere between 29850c65e3d4229b2ae8c90c441e559e72493da9 (good) and bc0fa8c3a26ef4f62b898c251e6969f0bb0660a3 (bad)

FreeBSD version

PCI Info

vgapci0@pci0:0:2:0:     class=0x030000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x
9a49 subvendor=0xf111 subdevice=0x0001
    vendor     = 'Intel Corporation'
    device     = 'TigerLake-LP GT2 [Iris Xe Graphics]'
    class      = display
    subclass   = VGA

DRM KMOD version Built from git at bc0fa8c3a26ef4f62b898c251e6969f0bb0660a3

To Reproduce Start x, wait a short while

Screenshots

[39.545793] drmn0: [drm] Selective fetch area calculation failed in pipe A
[46.125901] panic: mi_switch: switch in a critical section
[46.173916] cpuid = 6
[46.188556] time = 1757199723
[46.203111] KDB: stack backtrace:
[46.218015] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014d943620
[46.234423] vpanic() at vpanic+0x136/frame 0xfffffe014d943750
[46.250729] panic() at panic+0x43/frame 0xfffffe014d9437b0
[46.267159] mi_switch() at mi_switch+0x1bd/frame 0xfffffe014d9437d0
[46.283712] sleepq_switch() at sleepq_switch+0x109/frame 0xfffffe014d943810
[46.300316] sleepq_timedwait() at sleepq_timedwait+0x4b/frame 0xfffffe014d943850
[46.317006] linux_add_to_sleepqueue() at linux_add_to_sleepqueue+0xf1/frame 0xfffffe014d9438a0
[46.333962] linux_schedule_timeout() at linux_schedule_timeout+0x6a/frame 0xfffffe014d9438d0
[46.350961] intel_vblank_evade() at intel_vblank_evade+0x147/frame 0xfffffe014d943940
[46.367817] intel_legacy_cursor_update() at intel_legacy_cursor_update+0x2e2/frame 0xfffffe014d9439e0
[46.385019] drm_mode_cursor_common() at drm_mode_cursor_common+0x398/frame 0xfffffe014d943b10
[46.401970] drm_mode_cursor_ioctl() at drm_mode_cursor_ioctl+0x36/frame 0xfffffe014d943b50
[46.419061] drm_ioctl_kernel() at drm_ioctl_kernel+0xa9/frame 0xfffffe014d943b90
[46.436014] drm_ioctl() at drm_ioctl+0x2dc/frame 0xfffffe014d943c80
[46.453345] linux_file_ioctl() at linux_file_ioctl+0x313/frame 0xfffffe014d943ce0
[46.471075] kern_ioctl() at kern_ioctl+0x28c/frame 0xfffffe014d943d40
[46.488533] sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe014d943e00
[46.505894] amd64_syscall() at amd64_syscall+0x174/frame 0xfffffe014d943f30
[46.523457] fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014d943f30
[46.540885] --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x82d24504a, rsp = 0x85e55b388, rbp = 0x85e55b3b0 ---

emaste avatar Sep 06 '25 23:09 emaste

9962e195d029f049a53e87a7635ea1ca23a43a34 added a call to intel_vblank_evade() within preempt_disable, which calls schedule_timeout() (and thus mi_switch()); testing this:

diff --git a/drivers/gpu/drm/i915/display/intel_vblank.c b/drivers/gpu/drm/i915/display/intel_vblank.c
index 52f5251312..12461b7f0a 100644
--- a/drivers/gpu/drm/i915/display/intel_vblank.c
+++ b/drivers/gpu/drm/i915/display/intel_vblank.c
@@ -684,12 +684,16 @@ int intel_vblank_evade(struct intel_vblank_evade_ctx *evade)
 
 #ifdef __linux__
                local_irq_enable();
+#elif defined(__FreeBSD__)
+               preempt_enable();
 #endif
 
                timeout = schedule_timeout(timeout);
 
 #ifdef __linux__
                local_irq_disable();
+#elif defined(__FreeBSD__)
+               preempt_disable();
 #endif
        }

emaste avatar Sep 08 '25 18:09 emaste

With that change we move on to this panic:

[126.355346] panic: critical_exit: td_critnest == 0
[126.403042] cpuid = 5
[126.417152] time = 1757357369
[126.431235] KDB: stack backtrace:
[126.445547] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ff1719c0
[126.461469] vpanic() at vpanic+0x136/frame 0xfffffe00ff171af0
[126.477180] panic() at panic+0x43/frame 0xfffffe00ff171b50
[126.492834] intel_vblank_evade() at intel_vblank_evade+0x237/frame 0xfffffe00ff171bd0
[126.508901] intel_pipe_update_start() at intel_pipe_update_start+0x12e/frame 0xfffffe00ff171c30
[126.525405] intel_update_crtc() at intel_update_crtc+0x38/frame 0xfffffe00ff171ca0
[126.542070] skl_commit_modeset_enables() at skl_commit_modeset_enables+0x1f6/frame 0xfffffe00ff171d30
[126.559176] intel_atomic_commit_tail() at intel_atomic_commit_tail+0x88c/frame 0xfffffe00ff171df0
[126.576731] linux_work_fn() at linux_work_fn+0xe8/frame 0xfffffe00ff171e40
[126.593876] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfffffe00ff171ec0
[126.611246] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ff171ef0
[126.628621] fork_exit() at fork_exit+0x87/frame 0xfffffe00ff171f30
[126.645575] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ff171f30
[126.662491] --- trap 0x9899ffc0, rip = 0x1b7d9984cb60, rsp = 0x1b7d9992d1d0, rbp = 0x1b7d9992e1f0 ---

because of missing FreeBSD cases in intel_pipe_update_start and intel_pipe_update_end. We need to check all instances of local_irq_disable//local_irq_enable/local_irq_save/local_irq_restore.

emaste avatar Sep 08 '25 19:09 emaste

The driver works for me (for 10 minutes so far) with similar changes in intel_pipe_update_start and intel_pipe_update_end. I'll submit a pull request for these ones, but there are other cases that need to be reviewed.

emaste avatar Sep 08 '25 19:09 emaste

I still hit a similar panic with the i915 driver, though rarely.

I finally found a lead: on Linux, spin_lock() calls preempt_disable(), then do_raw_spin_trylock(). If it succeeds, it returns, but if it fails, if calls preempt_enable() and retry from start in an infinite loop.

dumbbell avatar Oct 11 '25 11:10 dumbbell

I have #388 in my local tree and encountered a similar kind of panic (not yet investigated in more detail)

[9363.841314] panic: mi_switch: switch in a critical section
[9363.891278] cpuid = 4
[9363.906595] time = 1761578617
[9363.921556] KDB: stack backtrace:
[9363.936319] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014def77d0
[9363.951024] vpanic() at vpanic+0x136/frame 0xfffffe014def7900
[9363.966019] panic() at panic+0x43/frame 0xfffffe014def7960
[9363.980626] mi_switch() at mi_switch+0x1bd/frame 0xfffffe014def7980
[9363.994913] __mtx_lock_sleep() at __mtx_lock_sleep+0x1c1/frame 0xfffffe014def7a10
[9364.009202] __mtx_lock_flags() at __mtx_lock_flags+0xdd/frame 0xfffffe014def7a60
[9364.022999] fwtable_read32() at fwtable_read32+0x55/frame 0xfffffe014def7aa0
[9364.036767] g4x_get_vblank_counter() at g4x_get_vblank_counter+0x82/frame 0xfffffe014def7ad0
[9364.051024] drm_update_vblank_count() at drm_update_vblank_count+0x6b/frame 0xfffffe014def7b50
[9364.065621] drm_crtc_accurate_vblank_count() at drm_crtc_accurate_vblank_count+0x61/frame 0xfffffe014def7b80
[9364.080781] drm_crtc_arm_vblank_event() at drm_crtc_arm_vblank_event+0x44/frame 0xfffffe014def7bb0
[9364.096094] intel_pipe_update_end() at intel_pipe_update_end+0x157/frame 0xfffffe014def7c10
[9364.111624] intel_update_crtc() at intel_update_crtc+0x5fe/frame 0xfffffe014def7c90
[9364.127193] skl_commit_modeset_enables() at skl_commit_modeset_enables+0x1f9/frame 0xfffffe014def7d20
[9364.143440] intel_atomic_commit_tail() at intel_atomic_commit_tail+0x86f/frame 0xfffffe014def7df0
[9364.160019] linux_work_fn() at linux_work_fn+0xe8/frame 0xfffffe014def7e40
[9364.176548] taskqueue_run_locked() at taskqueue_run_locked+0x1c7/frame 0xfffffe014def7ec0
[9364.193433] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe014def7ef0
[9364.210510] fork_exit() at fork_exit+0x87/frame 0xfffffe014def7f30
[9364.227329] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014def7f30
[9364.244188] --- trap 0xbb4, rip = 0, rsp = 0, rbp = 0 ---
[9364.260614] Uptime: 2h36m3s
[9364.309096] Dumping 3900 out of 32507 MB:

emaste avatar Oct 29 '25 17:10 emaste

Thank you! I will check this during the upcoming week-end.

dumbbell avatar Oct 29 '25 17:10 dumbbell

  • intel_update_crtc calls intel_pipe_update_start which calls preempt_disable
  • fwtable_read* calls spin_lock_irqsave(&uncore->lock, irqflags);

emaste avatar Nov 08 '25 18:11 emaste

@emaste: Could you please try #388 again? It reverts #376 in addition to adding a pair of preempt_*() calls.

I didn't get a panic neither yesterday nor today with Wayland and X.Org, but I admit I rarely hit the one you got anyway, so I'm not trusting my testing.

Update 1: And as soon as I posted this, I got the same panic as your first one :-) My assumptions are wrong.

Update 2: I force-pushed a modified branch. I will test it further. Could you please give it a try as well?

dumbbell avatar Nov 16 '25 16:11 dumbbell

I am running it now and will let you know.

emaste avatar Nov 17 '25 01:11 emaste

At 3c86e7b03112fa898bb08c1f3637f1dc61c2e1a5, still encountered:

[137367.046182] panic: mi_switch: switch in a critical section
[137367.086751] cpuid = 6
[137367.092883] time = 1770223261
[137367.099178] KDB: stack backtrace:
[137367.105818] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ff5f48f0
[137367.113983] vpanic() at vpanic+0x149/frame 0xfffffe00ff5f4a20
[137367.121940] panic() at panic+0x43/frame 0xfffffe00ff5f4a80
[137367.129892] mi_switch() at mi_switch+0x1bd/frame 0xfffffe00ff5f4aa0
[137367.138042] __mtx_lock_sleep() at __mtx_lock_sleep+0x1c1/frame 0xfffffe00ff5f4b30
[137367.146584] __mtx_lock_flags() at __mtx_lock_flags+0xdd/frame 0xfffffe00ff5f4b80
[137367.155204] intel_get_crtc_scanline() at intel_get_crtc_scanline+0x3c/frame 0xfffffe00ff5f4bb0
[137367.164152] intel_pipe_update_end() at intel_pipe_update_end+0x38/frame 0xfffffe00ff5f4c10
[137367.173263] intel_update_crtc() at intel_update_crtc+0x5fe/frame 0xfffffe00ff5f4c90
[137367.182466] skl_commit_modeset_enables() at skl_commit_modeset_enables+0x1e9/frame 0xfffffe00ff5f4d20
[137367.192214] intel_atomic_commit_tail() at intel_atomic_commit_tail+0x86f/frame 0xfffffe00ff5f4df0
[137367.202499] linux_work_fn() at linux_work_fn+0xe8/frame 0xfffffe00ff5f4e40
[137367.212716] taskqueue_run_locked() at taskqueue_run_locked+0x1d3/frame 0xfffffe00ff5f4ec0
[137367.223362] taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe00ff5f4ef0
[137367.234113] fork_exit() at fork_exit+0x87/frame 0xfffffe00ff5f4f30
[137367.244560] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ff5f4f30

emaste avatar Feb 04 '26 20:02 emaste