ddcutil icon indicating copy to clipboard operation
ddcutil copied to clipboard

Convert assert to if

Open tsdgeos opened this issue 4 months ago • 25 comments

ddc_open_display has this code

TRACED_ASSERT(bus_info); // need to convert to a test?

On KDE we are getting asserts with this trace

Thread 1270936 Crashed: 0 libc.so.6 0x7f48d3881e9c __pthread_kill_implementation (source-5f134746-#usr#src#debug#glibc-2.41-11.fc42.x86_64#nptl#pthread_kill.c:45) 1 libc.so.6 0x7f48d3881f63 __pthread_kill_internal (source-5f134746-#usr#src#debug#glibc-2.41-11.fc42.x86_64#nptl#pthread_kill.c:90) 2 libc.so.6 0x7f48d3827f3e __GI_raise (source-5b0e7cae-#usr#src#debug#glibc-2.41-11.fc42.x86_64#signal#..#sysdeps#posix#raise.c:27) 3 libc.so.6 0x7f48d380f6d0 __GI_abort (source-cde0a3ef-#usr#src#debug#glibc-2.41-11.fc42.x86_64#stdlib#abort.c:78) 4 libc.so.6 0x7f48d380f639 __assert_fail_base (source-41a13c79-#usr#src#debug#glibc-2.41-11.fc42.x86_64#assert#assert.c:119) 5 libc.so.6 0x7f48d38200af __assert_fail (source-41a13c79-#usr#src#debug#glibc-2.41-11.fc42.x86_64#assert#assert.c:128) 6 libddcutil.so.5 0x7f48d34b0b6f ddc_open_display (source-63f8c033-#usr#src#debug#ddcutil-2.2.1-1.fc42.x86_64#src#ddc#ddc_packet_io.c:333) 7 libddcutil.so.5 0x7f48d352d9de ddci_open_display3 (source-b50cbc0d-#usr#src#debug#ddcutil-2.2.1-1.fc42.x86_64#src#libmain#api_displays.c:621) 8 libddcutil.so.5 0x7f48d3533f81 ddca_open_display2 (source-b50cbc0d-#usr#src#debug#ddcutil-2.2.1-1.fc42.x86_64#src#libmain#api_displays.c:653) 9 libpowerdevilcore.so.2 0x7f48d528becc BrightnessWorker::ddcSetBrightness (source-77a70433-#usr#src#debug#powerdevil-6.4.5-1.fc42.x86_64#daemon#controllers#ddcutildisplay.cpp:241)

[1270936] Assertion failed: "bus_info" in file ddc_packet_io.c at line 332

So it would seem that yes, it makes sense to turn that assert into a check.

tsdgeos avatar Oct 19 '25 18:10 tsdgeos

What version of ddcutil does this occur with? At first glance it looks like a bug introduced in 2.2.0 that was fixed in 2.2.1. See issue #528.

rockowitz avatar Oct 20 '25 02:10 rockowitz

As per the line numbers it seems to be 2.2.1 (ddcutil-2.2.1-1.fc42.x86_64)

tsdgeos avatar Oct 20 '25 20:10 tsdgeos

It should be in the system log. journalctl -t org_kde_powerdevil, look for line containing "Initializing libddcutil".

rockowitz avatar Oct 21 '25 12:10 rockowitz

Also, can you build from branch 2.2.2-dev and confirm the problem exists with that branch?

rockowitz avatar Oct 21 '25 12:10 rockowitz

Also if possible:

  • output of journalctl -t org_kde_powerdevil, starting with the "Initializing libddcutil" line, that show the error
  • instructions on how to reproduce the error

rockowitz avatar Oct 21 '25 14:10 rockowitz

This is not a crash I have, this is a crash our users are having, sadly i can not provide any of that.

tsdgeos avatar Oct 21 '25 21:10 tsdgeos

Can you ask users to submit the output of journalctl -t org_kde_powerdevil described above? Github users can submit it (as an attachment) to this issue; anyone can send it to my email address rockowitz@minsoftcom. Without knowing what is going on, all I can do is convert the assert() to an "Internal Error" status code returned (ultimately) to PowerDevil.

rockowitz avatar Oct 22 '25 04:10 rockowitz

Unfortunately we can't contact the user. The backtrace i pasted is all we have, the crash was reported automatically via the crash reporter, we don't have a way to contact the user.

all I can do is convert the assert() to an "Internal Error" status code returned (ultimately) to PowerDevil.

Better than crashing i guess?

tsdgeos avatar Oct 22 '25 22:10 tsdgeos

Per commit 7fe3ff5aaee1bc6211a0c2da46c70faba864aa02, the assert()s in question have been converted to if tests that will ultimately cause the API call in question to write some information to the system log and return status DDCRC_INTERNAL_ERROR. Not very satisfactory, but absent more information as to what is going on that's the best I can do for now. How common is this failure? I suspect it occurs somehow as a result of display disconnection/connection.

rockowitz avatar Oct 25 '25 17:10 rockowitz

@tsdgeos thank you for reporting this and @rockowitz thank you for looking into it.

I think this might be a crash that I'm experiencing myself too. Here the requested information:

journalctl -t org_kde_powerdevil :

Oct 28 19:37:24 arch org_kde_powerdevil[2241]: [  2241] libddcutil recheck thread 0x55aa07250c50 started
Oct 28 19:37:24 arch org_kde_powerdevil[2241]: [  2241] libddcutil watch thread 0x55aa0718c3b0 started
Oct 28 19:37:24 arch org_kde_powerdevil[2241]: [  2241] Display redetection finished.
Oct 28 19:37:24 arch org_kde_powerdevil[2241]: [  2241] Unquiescing libddcutil API...
Oct 28 19:37:24 arch org_kde_powerdevil[2241]: [  4959] (dw_recheck_displays_func) Recheck interval: Sleeping for 200 milliseconds
Oct 28 23:34:00 arch org_kde_powerdevil[2241]: There are no outputs - creating placeholder screen
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4961] (ddc_open_display) Delay before rechecking attribute status: Sleeping for 1000 milliseconds
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4960] Removing connected display on bus 1
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4960] Emitting DDCA_Display_Status_Event[14980.110:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1-HDMI-A-1, dref: DDCA_Display_Ref[3], io_pa>
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4960] Starting 1 callback threads
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4960] libddcutil callback thread 0x7fd438014190 started
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  4960] Started 1 event callback thread(s)
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  5640] Invoking callback function 0x7fd4624eca00 for event DDCA_Display_Status_Event[14980.110:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1>
Oct 28 23:34:01 arch org_kde_powerdevil[2241]: [  5640] Callback function 0x7fd4624eca00 for event DDCA_Display_Status_Event[14980.110:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1-HDMI-A-1>
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4960] Adding connected display with bus 1
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4960] Emitting DDCA_Display_Status_Event[14980.794:  DDCA_EVENT_DISPLAY_CONNECTED, card1-HDMI-A-1, dref: DDCA_Display_Ref[4], io_path:>
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4960] Starting 1 callback threads
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4960] libddcutil callback thread 0x7fd4380403e0 started
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4960] Started 1 event callback thread(s)
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  5707] Invoking callback function 0x7fd4624eca00 for event DDCA_Display_Status_Event[14980.794:  DDCA_EVENT_DISPLAY_CONNECTED, card1-HD>
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  5707] Callback function 0x7fd4624eca00 for event DDCA_Display_Status_Event[14980.794:  DDCA_EVENT_DISPLAY_CONNECTED, card1-HDMI-A-1, d>
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: [  4961] Assertion failed: "bus_info" in file ddc_packet_io.c at line 332
Oct 28 23:34:02 arch org_kde_powerdevil[2241]: org_kde_powerdevil: ddc_packet_io.c:332: ddc_open_display: Assertion `bus_info' failed.
Oct 28 23:34:03 arch org_kde_powerdevil[2241]: KCrash: Attempting to start /usr/lib/org_kde_powerdevil
Oct 28 23:34:03 arch org_kde_powerdevil[2241]: KCrash: Application 'org_kde_powerdevil' crashing... crashRecursionCounter = 2
Oct 28 23:34:03 arch org_kde_powerdevil[5708]: Starting libddcutil. library built Not set at Not set. stdout_stderr_redirected=true
Oct 28 23:34:03 arch org_kde_powerdevil[5759]: Starting libddcutil. library built Not set at Not set. stdout_stderr_redirected=true
Oct 28 23:34:03 arch org_kde_powerdevil[5759]: Initializing libddcutil.  ddcutil version: 2.2.1, shared library: /usr/lib/libddcutil.so.5.3.0
Oct 28 23:34:03 arch org_kde_powerdevil[5759]: [  5759] Options passed from client:
Oct 28 23:34:03 arch org_kde_powerdevil[5759]: [  5759] Applying combined libddcutil options:
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: Library initialization complete.
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: Watching for DPMS state changes unimplemented
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5759] Watching for DPMS state changes unimplemented
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5759] Watching for display connection changes, resolved watch mode = Watch_Mode_Xevent, poll loop interval = 100 millisec
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5759]                                          extra_stabilization_millisec: 0,  stabilization_poll_millisec: 100
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5759] libddcutil recheck thread 0x55b5b1d54f70 started
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5759] libddcutil watch thread 0x55b5b1d54490 started
Oct 28 23:34:04 arch org_kde_powerdevil[5759]: [  5777] (dw_recheck_displays_func) Recheck interval: Sleeping for 200 milliseconds
Oct 28 23:34:05 arch org_kde_powerdevil[5759]: org.kde.powerdevil.chargethresholdhelper.getthreshold failed "Charge thresholds are not supported by the kernel for this hardware"
Oct 28 23:34:05 arch org_kde_powerdevil[5759]: org.kde.powerdevil.backlighthelper.brightness failed
Oct 28 23:34:05 arch org_kde_powerdevil[5759]: Handle button events action could not check for screen configuration
Oct 28 23:34:05 arch org_kde_powerdevil[5759]: org.kde.powerdevil.chargethresholdhelper.getthreshold failed "Charge thresholds are not supported by the kernel for this hardware"

Instructions on how to reproduce the error:

Arch Linux - KDE Plasma 6 - Wayland.

It happens for me when the monitor (HDMI) is turned off due to power management settings. Although, this also causes me to have to login again, so I'm not sure if it's login related or monitor related. Suspend is disabled.

Power management settings:

Image

arch-btw avatar Oct 29 '25 06:10 arch-btw

@arch-btw Thank you! That is immensely helpful.

Does the failure happen when the monitor is turned off or when it is turned on again? I'd like to better understand your configuration. Please run ddcutil interrogate and submit the output as an attachment.

rockowitz avatar Oct 29 '25 09:10 rockowitz

@rockowitz no problem at all, thank you too 👍

I have emailed you the log file for: ddcutil interrogate

I'm actually not sure if it happens during the monitor's shutdown or when I'm turning it back on. The message is there when I turn it back on but I'm not sure how long it's been there for.

Also, I wanted to mention that when I build ddcutil from the current dev branch, then the crash is gone. So it does appear to be fixed.

Thank you!

arch-btw avatar Oct 29 '25 11:10 arch-btw

@rockowitz no problem at all, thank you too 👍

I have emailed you the log file for: ddcutil interrogate

Received. Thank you.

I'm actually not sure if it happens during the monitor's shutdown or when I'm turning it back on. The message is there when I turn it back on but I'm not sure how long it's been there for.

Also, I wanted to mention that when I build ddcutil from the current dev branch, then the crash is gone. So it does appear to be fixed.

That's because the current dev branch has been modified to return a DDCRC_INTERNAL_ERROR result to the caller, i.e. PowerDevil, instead of failing with an assert(). So it looks like PowerDevil is successfully handling the error state. But the real solution is figuring out how the internal error came about. I'll probably ask you to run some tests once I've analyzed the situation more.

Thank you!

Thank you!

rockowitz avatar Oct 29 '25 15:10 rockowitz

I've made extensive changes to branch 2.2.2-dev to try to diagnose this bug. Most are in the low level tracing facilities so that what's output to the system journal will be useful for remote diagnosis. A few relate to the asynchonicity of display change detection and API calls. I have also reverted back to triggering an assert() failure when the bug is encountered.

Please use the following ddcutilrc configuration file:

[global]
options: --ts  --tid
options: --ddc-init-async-min 5
options: --trace-to-syslog
options: --enable-traced-function-stack
options: --trcfunc dw_emit_display_status_record  --trcback dw_emit_display_status_record
options: --trcfunc dw_remove_display_by_businfo   --trcback dw_remove_display_by_businfo
options: --trcfunc i2c_reset_bus_info             --trcback i2c_reset_bus_info
options: --trcfunc dw_execute_callback_func       --trcback dw_execute_callback_func
options: --trcfunc reset_published_dref_hash      --trcback reset_published_dref_hash 
options: --trcfunc add_published_dref_id_by_dref  --trcback add_published_dref_id_by_dref
options: --trcfunc delete_published_dref_id       --trcback delete_published_dref_id
options: --trcfunc free_display_ref               --trcback free_display_ref

[libddcutil]
options: --trace api

If you have not rebooted since installing the updated libddcutil, don't forget to restart powerdevil (sorry if I'm stating the obvious): systemctl --user restart plasma-powerdevil

When you hit the bug, send me the relevant (voluminous) output of journalctl -t org_kde_powerdevil.

Thank you.

rockowitz avatar Nov 01 '25 09:11 rockowitz

I just updated a typo in the ddcutilrc script in my prior comment. "global]" -> "[global]"

rockowitz avatar Nov 01 '25 13:11 rockowitz

Oops, sorry I didn't get a notification for this, I will send the log asap!

arch-btw avatar Nov 06 '25 11:11 arch-btw

@arch-btw Oops, my mistake. I should have explicitly cited you as you're not the one who originally posted the issue.

rockowitz avatar Nov 06 '25 12:11 rockowitz

@arch-btw Have you had a chance test for the failure? This has been the only blocker bug for release 2.2.2. As of commit 3b9b906 libddcutil API calls that would have triggered the assert() once again return DDCRC_INTERNAL_ERROR to the caller. If I can't make any further progress diagnosing this bug I'll leave it that way for the 2.2.2 release. Thanks.

rockowitz avatar Nov 11 '25 13:11 rockowitz

@rockowitz As I'm facing the same crashes on my machine (Ryzen 7900, Radeon RX 9060 XT, single HDMI monitor) I'm taking the liberty to provide my log of the crash. I've used the current master branch (8660f29c) of ddcutil.

powerdevil.log

The display was turned off right after powerdevil started (15:29:20) and turned back on at 15:31:00.

cgudrian avatar Dec 06 '25 21:12 cgudrian

I've disabled the tracing and added logging in dw_remove_display_by_businfo and ddc_open_display. In the error case ddc_open_display is being called twice after wakeup. Once with a fresh display ref and once with the old one that has been previously removed. (The relevant log lines are tagged with ####.)

Surprisingly to me dw_remove_display_by_businfo is actually called after wakeup. This is the log starting with the wakeup event.

Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Udev event detected
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: There are no outputs - creating placeholder screen
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78669] (ddc_open_display) Delay before rechecking attribute status: Sleeping for 1000 milliseconds
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Removing connected display on bus 5
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: #### [dw_remove_display_by_businfo] dref=0x55b6a95ab1e0 id=3 dref->detail=(nil) connector=card1-HDMI-A-1
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Emitting DDCA_Display_Status_Event[ 73.446:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1-HDMI-A-1, dref: DDCA_Display_Ref[3], io_path:/dev/i2c-5, ddc working: false]
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Starting 1 callback threads
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] libddcutil callback thread 0x7fcad401f090 started
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Started 1 event callback thread(s)
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78870] Invoking callback function 0x7fcb06607690 for event DDCA_Display_Status_Event[ 73.446:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1-HDMI-A-1, dref: DDCA_Display_Ref[3], io>
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78870] Callback function 0x7fcb06607690 for event DDCA_Display_Status_Event[ 73.446:  DDCA_EVENT_DISPLAY_DISCONNECTED, card1-HDMI-A-1, dref: DDCA_Display_Ref[3], io_path:/de>
Dez 07 16:25:52 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Udev event detected
Dez 07 16:25:53 orion2.fritz.box org_kde_powerdevil[77928]: [ 78668] Adding connected display with bus 5
Dez 07 16:25:53 orion2.fritz.box org_kde_powerdevil[77928]: #### [ddc_open_display] dref=0x7fcad402c720 id=4 dref->detail=0x55b6a96a0710 connector=card1-HDMI-A-1
Dez 07 16:25:53 orion2.fritz.box org_kde_powerdevil[77928]: #### [ddc_open_display] dref=0x55b6a95ab1e0 id=3 dref->detail=(nil) connector=card1-HDMI-A-1

cgudrian avatar Dec 07 '25 15:12 cgudrian

After wakeup, PowerDevil's BrightnessWorker tries to set the brightness on the disconnected display. That causes ddc_open_display to crash as businfo has already been set to NULL. I'm not sure, who is to blame here.

cgudrian avatar Dec 07 '25 15:12 cgudrian

Here's the race condition:

After wakup, PowerDevil calls ddca_open_display2 with the still valid (i.e. considered connected) display ref 1 (from before the sleep). The DRM state of the associated connector, however, is still "disconnected" which makes ddc_open_display enter a one second waiting period.

During that period, on the polling thread, dw_remove_display_by_businfo is now called for display ref 1, setting its disconnected flag and invalidating its businfo.

At around the same time the DRM state of the connector changes to "connected" which in turn makes ddc_open_display continue normally, using the now invalid ref 1 which results in the failed assertion.

Adding this disconnected check in ddc_open_display fixes the crash for me. It makes the function exit early with an error code. PowerDevil logs a warning. That's it.

diff --git a/src/ddc/ddc_packet_io.c b/src/ddc/ddc_packet_io.c
index d0c345fa..ab79737f 100644
--- a/src/ddc/ddc_packet_io.c
+++ b/src/ddc/ddc_packet_io.c
@@ -332,6 +332,8 @@ ddc_open_display(
          err = ERRINFO_NEW(DDCRC_DISCONNECTED, "Display disconnected");
       }
       free(status);
+      if (dref->flags & DREF_DISCONNECTED)
+         err = ERRINFO_NEW(DDCRC_DISCONNECTED, "Display disconnected");
       if (err)
          goto bye;
    }

cgudrian avatar Dec 07 '25 17:12 cgudrian

Thank you for the great sleuthing. There still is a tiny window of vulnerability, if dw_remove_display_by_businfo() executes after the test you added but before the assignment businfo=dref->detail. Setting flag DREF_DISCONNECTED and setting dref->detail = NULL needs to become an atomic operation and/or there should be a lock preventing ddc_open_display() and dw_remove_display_by_businfo() from executing simulateously. I'll have to think about it. For now I'll put your patch into the development branch as it reduces the window of vulnerability from 1 second to nanoseconds.

In an earlier comment you noted "Surprisingly to me dw_remove_display_by_businfo is actually called after wakeup." The reason is that, in some circumstances, UDEV does not emit a disconnect event immediately. Rather, when the display is reconnected, UDEV emits disconnect and connect events in quick succession.

Again, thank you!

rockowitz avatar Dec 08 '25 04:12 rockowitz

The patch has been applied to branch 2.2.4-dev.

rockowitz avatar Dec 08 '25 05:12 rockowitz

@cgudrian @tsdgeos Branch 2.2.4-dev contains what should be a more complete fix using a mutex to prevent the race condition. Have at it!

rockowitz avatar Dec 10 '25 08:12 rockowitz