ddcutil icon indicating copy to clipboard operation
ddcutil copied to clipboard

lenovo dock, dual mon setup, one not detected

Open laur89 opened this issue 4 years ago • 22 comments

Hia, suspect my issue has to do with docking station or GPU quirks, but I'll give it a go nonetheless.

ThinkPad p14s with an integrated AMD Radeon card, docked to 40AJ docking station. Running debian testing, kernel 5.10.0-3-amd64, amd drivers. Docking station firmware updated to latest ver.

Two identical screens are hooked to the dock via DP: Asus pa278qv; config is as when received, but I did confirm DDC-CI is enabled on both of them.

Manually loaded i2c-dev kernel module, after which ddcutil commands started working.

To start off, the detect appears to be working on one of the displays, other is reported as Invalid display (DDC communication failed). It's also noteworthy that after querying for environment, the only working screen was no longer reported by detect - only eDP and the other invalid display remained, until the laptop was re-docked.

When the one display is reported by detect, then sudo ddcutil capabilities | grep "Feature: 10" (brightness) does report the correct value.

Also note the xrandr output lists the screens as DP-3 and 4, but sometimes they get reported as DP-5 and 6 instead.

Logs in this gist

Edit: forgot to include the ddcutil ver used: 0.9.9-2 from debian repos.

laur89 avatar Feb 12 '21 16:02 laur89

The problems you're experiencing are similar to ones seen with the i915 driver since kernel release 5.10 finally implemented support for I2C on monitors connected through docking stations. In particular, the same monitor can appear as 2 different /dev/i2c devices, only 1 of which supports DDC. However, reading the EDID for the /dev/i2c device supporting DDC only works some of the time. If reading EDID succeeds but DDC fails, ddcutil reports an Invalid Display. However, if reading the EDID fails, there's no monitor to report.

The amdgpu and i915 drivers share DRM code, so the problems may well be related.

ddcutil release 1.0.0 contains a workaround for these problems. It was enhanced in release 1.0.1, and is further tweaked in the currently development release, 1.0.2.

If you can, please build ddcutil from the 1.0.2-dev branch, run the following commands, and submit the output:

$ ddcutil detect --verbose $ ddcutil environment --very-verbose 2>&1

rockowitz avatar Feb 13 '21 19:02 rockowitz

Thought it was similar, but didn't realize i915 drivers (or the drm logic in this case) are also in use with amd gpus. Built form 1.0.2-dev (d604f1c3).

Output here

Note two external displays are now detected, but there's still one "phantom" invalid one on top of all three real/detected ones.

laur89 avatar Feb 14 '21 15:02 laur89

Than you for the listings. Now I have a dump of (relevant portions of) /sys for the amdgpu case as well as the i915 case.

I'm hoping I'll be able to figure out how to associate the "phantom" /dev/i2c device with the real one and filter it out. So far I haven't been able to. We may have to live with the phantom device until the drm driver code is fixed.

rockowitz avatar Feb 14 '21 22:02 rockowitz

To be clear - the phantom display doesn't really stop us from doing anything, does it?

laur89 avatar Feb 16 '21 16:02 laur89

No, it's just confusing until you know what's going on.

rockowitz avatar Feb 16 '21 18:02 rockowitz

The current 1.0.2-dev branch contains experimental code to detect "phantom" displays, at least in your case. If one is found, detect reports it as a "Phantom Display" as opposed to an "Invalid Display". It's activated by utility option --f2

Can you execute the following command and submit the output as an attachment or gist?

ddcutil detect --very-verbose --f2

Thank you.

rockowitz avatar Feb 18 '21 17:02 rockowitz

Built from 3977cc4b

Filter phantom displays
ddcutil version:            1.0.2
General Build Options:
   BUILD_SHARED_LIB:    Defined
   ENABLE_ENVCMDS:      Defined
   ENABLE_FAILSIM:      Not defined
   ENABLE_UDEV:         Defined
   USE_X11:             Defined
   USE_LIBDRM:          Defined
   USE_USB:             Defined
   WITH_ASAN:           Not defined

Private Build Options:
   TARGET_LINUX:        Defined
   TARGET_BSD:          Not defined
   INCLUDE_TESTCASES:   Not defined

Output level:               Very Vebose
Reporting DDC data errors:  false
Trace groups active:        none
Traced functions:           none
Traced files:               none
Force I2C slave address:    false
User defined features:      disabled

Performance and Retry Options:
   Deferred sleep enabled:                      false
   Sleep suppression (reduced sleeps) enabled:  true
   Dynamic sleep adjustment enabled:            false

Experimental Options:
   Utility option --f1 disabled: EDID read uses I2C layer
   Utility option --f2 enabled:  Filter phantom displays
   Utility option --f3 disabled: Unused
   Utility option --f4 disabled: Read strategy tests
   Utility option --f5 disabled: Unused
   Utility option --f6 disabled: Force I2c bus
   Utility option --i1 = -1:     Unused

(i2c_set_addr                  ) /dev/i2c-11
(i2c_set_addr                  ) addr = 0x37. filename = /dev/i2c-11, Returning EBUSY(-16): Device or resource busy
(i2c_set_addr                  ) /dev/i2c-12
(i2c_set_addr                  ) addr = 0x37. filename = /dev/i2c-12, Returning EBUSY(-16): Device or resource busy
(i2c_set_addr                  ) /dev/i2c-11
(i2c_set_addr                  ) addr = 0x37. filename = /dev/i2c-11, Returning EBUSY(-16): Device or resource busy
(i2c_set_addr                  ) /dev/i2c-12
(i2c_set_addr                  ) addr = 0x37. filename = /dev/i2c-12, Returning EBUSY(-16): Device or resource busy
(filter_phantom_displays       ) Starting.  all_displays->len = 4
(filter_phantom_displays       ) 0 valid displays, 4 invalid displays
(filter_phantom_displays       ) Done

stderr had this:

(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
ddcutil: ddc_displays.c:457: ddc_report_display_by_dref: Assertion `dref->flags & DREF_DDC_COMMUNICATION_CHECKED' failed.
Aborted

I suspect it's something else, I'll retry later after a reboot.

Fyi I've started using ddcci kernel module that provides external monitor interfaces at /sys/class/backlight. Maybe that keeps the devices busy?

Edit:

sudo modprobe -r ddcci
modprobe: FATAL: Module ddcci is in use.

Looks like something's using it, no idea. As said, i'll blacklist the module later and retry after reboot.

laur89 avatar Feb 18 '21 19:02 laur89

Likely ddcci. It's interesting to see how it causes failure.

Try using option --force-slave-address. If ioctl(I2C_SLAVE) fails, ddcutil will retry using I2C_SLAVE_FORCE. Expect to see a diagnostic message for this unusual situation.

rockowitz avatar Feb 18 '21 23:02 rockowitz

sudo ddcutil detect --very-verbose --f2 --force-slave-address of same build: output

stderr:

(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy
(i2c_set_addr) Error in ioctl(I2C_SLAVE), errno=EBUSY(16): Device or resource busy

laur89 avatar Feb 19 '21 00:02 laur89

So --force-slave-address solves the conflict with ddcci. Does ddcci continue to work? What to send to stdout and what to send to stderr can be a grey area. I've tweaked the messages to not regard EBUSY as something warranting stderr.

Re not regarding /dev/i2c-9 as phantom, the check for the presence of attribute edid was miscoded. it has been fixed.

Please test with the current 1.0.2-dev. Thank you.

rockowitz avatar Feb 19 '21 02:02 rockowitz

7400fc52 build

sudo ddcutil detect --very-verbose --f2 --force-slave-address gist. stderr was now empty.

Does ddcci continue to work?

Screen brightness change continued working by writing into the device files throughout, so I'm guessing ddc was not impacted.

What to send to stdout and what to send to stderr can be a grey area.

Oh absolutely, didn't intend to point it out as a fault, just wanted to be clear gist didn't include stderr.

Note I've yet to produce post-reboot reproduction without the ddci module loaded. System is in bit of a delicate situation atm and I'm afraid to reboot. Will try to do tonight.

Edit: finally rebooted. same command without ddci module loaded: gist (and now omitting --force-slave-address resulted in same output, as expected)

laur89 avatar Feb 19 '21 12:02 laur89

The current 1.0.2-dev build contains several changes to address the problems you've detected:

  • Fix the cause of the assert() failure when the setting the I2C slave address returns -EBUSY
  • Reduce the messages that are issued unconditionally re the EBUSY ioctl error to a) the case that the final status of the function setting the I2C slave address returns -EBUSY, or (b) to report successful recovery because of option --force-slave-address
  • fix (again) logic in the function detecting the "phantom" display
  • ddcutil detect output re the phantom display reports the "real" display

Note that experimental option --f2 is still required to enable phantom display detection.

Let me know how it goes.

Unrelated to all the above, you should now see significantly improved performance on the capabilities command. By default, capabilities strings are now cached in file ~/.local/share/ddcutil/capabilities.

rockowitz avatar Feb 20 '21 04:02 rockowitz

Built from d4133fc2

sudo ddcutil detect --very-verbose --f2 output here

Awesome work, looks like phantom display detection is working flawlessly! At least on my hardware that is, can only assume the problem domain is dependent on so many different variables, making it largely hit and miss.

This is likely unrelated to the topic, but could this i2c address detection problem also be the cause for extreme instability when screens are shut off via dpms off? Usually I come to find xserver having crashed (don't use any login manager, and find the tty console back showing login), with very little in xorg logs, but plenty of errors in dmesg mentioning amdgpu and drm - related or not I'm not completely sure.

Seen some similar issues reported like this or this or this, but they're not exactly the same as my stacktrace:

Feb 13 17:40:22 p14s kernel: [86229.846462] Call Trace:
Feb 13 17:40:22 p14s kernel: [86229.846524]  amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu]
Feb 13 17:40:22 p14s kernel: [86229.846533]  backlight_suspend+0x6a/0x80
Feb 13 17:40:22 p14s kernel: [86229.846535]  ? brightness_store+0x70/0x70
Feb 13 17:40:22 p14s kernel: [86229.846537]  dpm_run_callback+0x4c/0x120
Feb 13 17:40:22 p14s kernel: [86229.846538]  __device_suspend+0xfa/0x410
Feb 13 17:40:22 p14s kernel: [86229.846539]  dpm_suspend+0x13f/0x260
Feb 13 17:40:22 p14s kernel: [86229.846541]  dpm_suspend_start+0x77/0x80
Feb 13 17:40:22 p14s kernel: [86229.846543]  suspend_devices_and_enter+0x109/0x760
Feb 13 17:40:22 p14s kernel: [86229.846546]  pm_suspend.cold+0x329/0x374
Feb 13 17:40:22 p14s kernel: [86229.846547]  state_store+0x71/0xd0
Feb 13 17:40:22 p14s kernel: [86229.846550]  kernfs_fop_write_iter+0x124/0x1b0
Feb 13 17:40:22 p14s kernel: [86229.846552]  new_sync_write+0x11c/0x1b0
Feb 13 17:40:22 p14s kernel: [86229.846554]  vfs_write+0x1c2/0x260
Feb 13 17:40:22 p14s kernel: [86229.846555]  ksys_write+0x5f/0xe0
Feb 13 17:40:22 p14s kernel: [86229.846557]  do_syscall_64+0x33/0x80
Feb 13 17:40:22 p14s kernel: [86229.846559]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Any idea under which project shall I report this?

Edit: note the issue (? i'm guessing it is) with detect not working properly after querying for environment is still there. The gist linked above contains second file showing what the detect output showed after having queried environment.

laur89 avatar Feb 22 '21 14:02 laur89

Thanks for the kudos. I'm not convinced that the solution works for drivers other than amdgpu and i915, which is why the check is coded the way it is: check that a display for which DDC communication fails has an EDID that matches one for a "valid" display, and also check that its /sys attributes (disconnected, status enabled, no EDID) are as expected for this situation. Presumably this would work for any DRM driver, but since I'm not absolutely sure the /dev/i2c device is reported as a phantom display rather that suppressed. (By the way, nice choice of the term "phantom" on your part.)

Re your xserver crash, I looked at kernel file amdgpu_drm.c containing admgpu_dm_backlight_update_status(), and understanding what is going on is beyond my pay grade. However, I would hazard that the code is confusing the phantom display for the real one. Interestingly, the crash occurs on suspend, rather than resume. I'd say that this should be directed to the amdgpu driver folks.

The failure to detect the monitors on the dock after the environment command is troubling. Is this consistent? Was there any output that went to stderr? Please run the following sequence and submit the output:

ddcutil detect --very-verbose --f2 2>&1
ddcutil environment -very-verbose 2>&1
ddcutil detect --very-verbose --f2 2>&1

rockowitz avatar Feb 22 '21 18:02 rockowitz

Cheers for taking a gander, I'll see what the amdgpu devs have to say about it.

Re. detect-env issue, these are the logs. Note it was already the case when I first reported this issue, so at the very least it's not a regression from v0.9.9. https://gist.github.com/laur89/af34e8042e7ed6325b65bcb44e8e27ca

Note I forgot --f2 flag enabled for environment command as well, but it makes no difference.

Edit: just re-ran the detect command after ~20hours without rebooting/redocking the computer, and both displays are properly detected again. Unsure if it's to do with dpms having kicked in or the recovery being function of time.

laur89 avatar Feb 24 '21 15:02 laur89

Hi, I think I might be experiencing the same problem here. I'm using a Lenovo Dock with 2 same monitors connected by displayport. The dock is connected over usb-c/Thunderbolt.

I'm using the 2.0.0 version. Everything works fine if I connected a single monitor (directly or via dock) to my laptop. When using 2 via the dock, it can't communicate with 1 display.

Unsure, but wondering if those logs might be of any help? (sudo ddcutil interrogate --verbose)

https://gist.github.com/JeCheeseSmith/b11158dec6a2b963939782c2a894f109

Thanks for maintaining this tool!

JeCheeseSmith avatar Nov 21 '23 21:11 JeCheeseSmith

@JeCheeseSmith You are using a nearly 4 year old version of ddcutil. Please build from branch 2.0.2-dev and retest.

rockowitz avatar Nov 22 '23 00:11 rockowitz

Apologies, I installed it using the debian/ubuntu packages and somehow assumed they would be up to date. I'm pretty much a newbie to the topic and maybe even the field.

I installed the 2.0.2-dev branch and retested. https://gist.github.com/JeCheeseSmith/9fdbd80d7e8998fa9a9d48f1233a3204

It works for the same monitor, the 2nd one is not detected.

JeCheeseSmith avatar Nov 22 '23 20:11 JeCheeseSmith

@JeCheeseSmith I see that you're running Ubuntu 20.04. There's been a lot of work in the DRM video drivers, particularly i915 and amdgpu, regarding I2C support for docking stations. It may be that your amdgpu driver is just not recent enough. Unfortunately, I can't tell the amdgpu version you're running. Command apt list | grep amdgpu will show the verions of the related packages.

However, you've hit a bug in the environment command and it crashed, so much of the output is missing. Please run it again using valgrind, which should locate where the crash happens. In case you're not familiar with it, valgrind ddcutil environment --verbose short report on the double free. Thanks.

rockowitz avatar Nov 22 '23 23:11 rockowitz

@rockowitz I ran apt list | grep amdgpu and didn't got any wiser out of it myself. I indeed run Ubuntu based (Zorin OS). I suppose its a possibilty my drivers aren't the most recent, I've had problems with it before.

I reran the command with valgrind 4 times (incl. reboot) , logs of both commands are in this gist: https://gist.github.com/JeCheeseSmith/03dee3dac294a0f67df80fe2b373eff5 I'm wondering if those errors come to faulty install of ddcutil (or other software) myself?

JeCheeseSmith avatar Nov 23 '23 22:11 JeCheeseSmith

I have good news and bad news.

Good news for me: You've identified some very obscure bugs.

Bad news for you: Your Linux kernel and amdgpu drivers are too old. Major changes for I2C handling with DP Multi-Stream Transport (MST) which is what is used in your dock, went into the kernel 5.17 or 5.19 IIRC. Your kernel is 5.15. Also, the amdgpu drivers are old. In some cases their version numbers are significantly less than those on this system, running Ubuntu 23.04. in other cases the version numbers aren't even comparable.

rockowitz avatar Nov 24 '23 12:11 rockowitz

@rockowitz Alright, that explains it for me then. For the Distro I'm using, there's yet to come a new release using new Ubuntu versions/kernels. I will stick with my monitor buttons for now

I'm glad you got some good news ;) Thanks for helping and maintaining the tool!

JeCheeseSmith avatar Dec 03 '23 19:12 JeCheeseSmith