Unable to detect display
Hi, in Debian sid recently ddcutil has been upgraded from 0.9.9 to 1.3.0.
Version 1.3.0 is unable to detect displays, while reverting back to 0.9.9 fix the issue.
Output of sudo ddcutil interrogate --verbose of both 0.9.9 and 1.3.0 attached.
Please let me know what else you need to track down this issue.
Thanks.
Please execute ddcutil detect --verbose --trace ddc --trace i2c and submit the attachment.
It looks like you're using the newly open sourced Nvidia driver. Is that correct?
No that is not correct, i'm using the proprietary legacy driver version 340.108
Here it is: ddcutil_0.9.9_detect.txt ddcutil_1.3.0_detect.txt
Thanks
As a first step, as a hunch I've modified the i2c writer function to allocate a structure on the heap instead of using the stack, as is done in the reader function. Please build from branch 1.4.0-dev and execute ddcutil detect --verbose --trace i2c. Thanks.
I don't know if it can help you, but i've build the following branches also: 1.0.0-release 1.1.0-release 1.2.0-release 1.2.1-release 1.2.2-release 1.2.3-dev
all work till 1.2.2-release, starting from 1.2.3-dev ddcutil has this issue.
UPDATE: More precisely starting from this https://github.com/rockowitz/ddcutil/commit/f6c72c6df1ed4bdccd0e1f818e25c7e39a1dc875
Background: Driver i2c-dev has two interfaces: an ioctl() interface and a higher level interface using read() and write(). Internally, the read()/write() interface maps to the same code as the ioctl() interface. However, using the read()/write() interface can lead to EBUSY errors because of the way that the i2c slave address is "predeclared" outside of the read()/write() calls.
Release 1.3.0 eliminates all use of the read()/write() interface. Previously the EDID was read using the read()/write() interface.
There's something about the proprietary nvidia driver that doesn't like the arguments passed on the ioctl() call. I've added debug code and also moved buffers from the stack to the heap as a guess that that's the locus of the problem.
Please build from 1.4.0-dev again and again run ddcutil detect --verbose --trace i2c. If that doesn't solve or at least identify the problem, I need to install a Nvidia card on a test system along with the proprietary driver to try to understand what is happening. Unfortunately, there's not enough time to do that along with everything else that needs to happen before I leave for vacation Wednesday, which is unfortunate because I do regard this as a significant bug. Thank you for reporting it and your help in diagnosing it.
Sadly the issue isn't solved, hope this will help at least to identify the problem: ddcutil_1.4.0-dev_detect(2).txt
...if not, yeah it can be a significant bug, but there is no rush. In any case i'll be here to help if you need.
Thanks.
Another thing to try, if you would, again with 1.4.0-dev.
Enable kernel tracing as per this article. Then execute ddcutil detect --verbose --trace i2c as usual, and send both the ddcutil trace output and the kernel debug output. Thanks.
Sure, here they are: ddcutil_1.4.0-dev_detect(3).txt i2c_kernel_trace.txt
The kernel trace is as expected, indicating that the data structures passed on the ioctl() calls is correct.
So why is the nvidia driver returning -EINVAL? The slave address is specified in the ioctl(I2C_RDWR) calls. ioctl(I2C_SLAVE_FORCE) is a way to specify the slave address "out of band" when using the i2c-dev write()/read() interface - there's no way to include it on write()/read(). It should have no effect when using the ioctl(I2C_RDWR) interface. However, it may be that nvidia driver requires it. So as a test, the latest changes in branch 1.4.0-dev add ioctl(I2C_SLAVE_FORCE) calls.
Please build from the current 1.4.0-dev branch and execute ddcutil detect --verbose --trcfunc i2c_ioctl_writer --trcfunc i2c_ioctl_reader. Thank you.
For the sake of completeness, i want to give you the output of both version 1.4.0-dev and 1.2.2 (as well as i2c kernel trace). I'm not an expert, but i want to point out a difference in the i2c kernel trace between 1.4.0-dev and 1.2.2, specifically the value of f:
i2c_kernel_trace_(ddcutil_1.4.0-dev), the value is f=0200:
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
ddcutil-50583 [000] ..... 1013.525510: i2c_write: i2c-1 #0 a=050 f=0200 l=1 [00]
ddcutil-50583 [000] ..... 1013.525514: i2c_result: i2c-1 n=1 ret=-22
.........................................................
i2c_kernel_trace_(ddcutil_1.2.2), the value is f=0000:
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
ddcutil-62269 [003] ..... 1358.013381: i2c_write: i2c-1 #0 a=050 f=0000 l=1 [00]
ddcutil-62269 [003] .N... 1358.013894: i2c_result: i2c-1 n=1 ret=1
.........................................................
Is this normal?
ddcutil_1.4.0-dev_detect(4).txt i2c_kernel_trace_(ddcutil_1.4.0-dev).txt
ddcutil_1.2.2_detect.txt i2c_kernel_trace_(ddcutil_1.2.2).txt
Thanks
@KeyofBlueS @stephvnm
Your observation hit the nail on the head. f is the flags halfword in the ioctl call. x0200 is flag I2C_M_DMA_SAFE. It is set within i2c-dev when handling the ioctl call, and is used only within kernel space (i.e. ddcutil does not set it). read()/write() take a different code path and the flag does not appear to be set in that case. I've looked in the Nvidia driver code at github rrepo NVIDIA/open-gpu-kernel-modules and depending how it is compiled the nvidia driver may reject calls with the flag set and return -EINVAL.
So there's bug in the drivers, and probably some finger pointing. i2c-dev assumes that the video driver can handle the flag, and amdgpu, nouveau, etc. do, but nvidia does not.
The problem arose in ddcutil 1.3.0 because I changed the code that reads the EDID from using read()/write() to using ioctl(), and it was not caught for the release candidates.
I'm leaving on vacation momentarily, and while I'll have email and web access, I won't be able to work on the code base. So for now, just use release 1.2.2.
Thank you @KeyofBlueS for all your help in diagnosing the bug.
Sure, enjoy your well deserved vacation, have a good time and thank you for all your work.
Regards.
@KeyofBlueS, @stephvnm Branch 1.3.3-dev reverts the method for reading the EDID to the way it was done in 1.2.2. Please build from this version and let me know if it resolves the problem. Thank you.
Hi and welcome back!
With 1.3.3-dev displays are detected, but will give DDC communication failed.
The relevant bus is i2c-1. Disregard i2c-4, it's a TV.
edit: SORRY I'VE MESSED WITH THE LOG, please redownload it!
@KeyofBlueS , @stephvnm I've re-installed the general code from 1.2.2 that uses read() and write() for i2c communication. (The amount of code that had to go back in was painful.)
On the latest branch of 1.3.3-dev, by default ddcutil still uses ioctl() for talking to slave address x37. If utility option --f1 is specified read()/write() is used. Let me know if DDC communication works with --f1.
No, sadly not. From the log it seems ioctl is still used for address x37 even with --f1 option.
My oops. I missed a one spot (function i2c_detect_x37()) that had to be modified. I've uploaded the latest change to 1.3.3-dev.
Well, now I'm baffled.
Are you building the nvidia driver using DKMS, or using a pre-built copy? If the former, can you determine which copy of i2c.h it is using and whether constant I2C_M_DMA_SAFE is defined?
What nvidia packages are installed?
Please execute sudo ddcutil interrogate and send the output. Thanks.
With DKMS.
For i2c.h, what comes with the kernel headers i guess (actually i use 5.18.5). I've attached two i2c.h files founded inside the kernel headers:
in the first one i see:
#define I2C_M_DMA_SAFE 0x0200 /* use only in kernel space */
nvidia packages installed:
glx-alternative-nvidia/unstable,now 1.2.1 amd64 [installed,automatic]
libegl1-nvidia-legacy-340xx/unstable,now 340.108-15 amd64 [installed,automatic]
libegl1-nvidia-legacy-340xx/unstable,now 340.108-15 i386 [installed,automatic]
libgl1-nvidia-legacy-340xx-glx/unstable,now 340.108-15 amd64 [installed,automatic]
libgl1-nvidia-legacy-340xx-glx/unstable,now 340.108-15 i386 [installed]
libgles1-nvidia-legacy-340xx/unstable,now 340.108-15 amd64 [installed,automatic]
libgles1-nvidia-legacy-340xx/unstable,now 340.108-15 i386 [installed]
libgles2-nvidia-legacy-340xx/unstable,now 340.108-15 amd64 [installed,automatic]
libgles2-nvidia-legacy-340xx/unstable,now 340.108-15 i386 [installed]
libnvidia-legacy-340xx-cfg1/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-cfg1/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-cuda1-i386/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-cuda1/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-cuda1/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-eglcore/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-eglcore/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-encode1/unstable,now 340.108-15 amd64 [installed]
libnvidia-legacy-340xx-encode1/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-glcore/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-glcore/unstable,now 340.108-15 i386 [installed,automatic]
libnvidia-legacy-340xx-ml1/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-nvcuvid1/unstable,now 340.108-15 amd64 [installed,automatic]
libnvidia-legacy-340xx-nvcuvid1/unstable,now 340.108-15 i386 [installed,automatic]
nvidia-installer-cleanup/unstable,now 20220217+1 amd64 [installed,automatic]
nvidia-kernel-common/unstable,now 20220217+1 amd64 [installed,automatic]
nvidia-legacy-340xx-alternative/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-legacy-340xx-driver-bin/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-legacy-340xx-driver-libs/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-legacy-340xx-driver-libs/unstable,now 340.108-15 i386 [installed]
nvidia-legacy-340xx-driver/unstable,now 340.108-15 amd64 [installed]
nvidia-legacy-340xx-kernel-dkms/unstable,now 340.108-15 amd64 [installed]
nvidia-legacy-340xx-kernel-support/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-legacy-340xx-smi/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-legacy-340xx-vdpau-driver/unstable,now 340.108-15 amd64 [installed,automatic]
nvidia-modprobe/unstable,now 515.48.07-1 amd64 [installed,automatic]
nvidia-persistenced/unstable,now 470.129.06-1 amd64 [installed,automatic]
nvidia-settings-legacy-340xx/unstable,now 340.108-6 amd64 [installed,automatic]
nvidia-support/unstable,now 20220217+1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-legacy-340xx/unstable,now 340.108-15 amd64 [installed]
I've compiled 1.3.3-dev with your recent changes (https://github.com/rockowitz/ddcutil/commit/5b18ee36aa17eb3aed49aa976ba27b1d7e5ba648 https://github.com/rockowitz/ddcutil/commit/da77d55397a818488ba288e7b20f5a62e2ddcdcd https://github.com/rockowitz/ddcutil/commit/351913dfec19b31ff57750a4561fbc06b12b27b9) and it's working now with --f1 option :+1:
I've updated the proof of concept "--f1" code in branch 1.3.3-dev to something with a coherent user interface. The changes required were extensive.
To recap, ddcutil has to navigate schylla and charibdis in its calls into driver i2c-dev.
- Using the ioctl() interface to read and write the I2C bus has the advantage of avoiding EBUSY errors. While in principle they can have many causes, in fact they has only been observed when using the read()/write() interface. In particular, they commonly occur when driver ddcci is loaded. The EBUSY errors can be addressed using option --force-slave-address, but this may affect other users of the I2C bus.
- The ioctl() interface avoids (most/all) EBUSY errors, but has the disadvantage that, depending on how the proprietary nvidia driver has been built, it may trigger an incompatibility (aka bug) between driver i2c-dev and nvidia. All access fails with EINVAL error. I can see the iftest in the both the interface code that must be compiled by DKMS for the driver, and also in the correponding file in open-gpu-kernel-modules. The ioctl() interface cannot be used in this context.
- The read()/write() interface has the advantage that it works with the nvidia driver, but EBUSY errors are possible.
I've added 2 additional options --use-file-io and --use-ioctl-io. The --f1 option no longer works. The former option causes ddcutil to use the write()/read() interface. The latter causes the ioctl() interface to be used. If neither is specified, the ioctl() inferface is used. However, if the nvidia/i2c-dev bug is encountered, ddcutil switches to the write()/read() interface. (FWIW, I'm not particularly keen on the option names. Suggestions welcome.)
Please exercise the updated 1.3.3-dev branch. It should work for you both when use-file-io is specified, or if neither option is specified. As I said, the changes required were extensive. No doubt some bugs remain.
Confirmed detect is working as expected. With no option passed or with --use-ioctl-io, it tries with ioctl() inferface, then fallback into using write()/read() interface. With option --use-file-io will use write()/read() interface straight.
So i guess there is no way to use ioctl() inferface with problematic (nvidia) drivers, maybe by force removing the I2C_M_DMA_SAFE flag.
P.S. options names are ok :+1:
Thank you for the quick testing.
Modifying the i2c-dev driver to not set the bit would be a bit of work unless you regularly build the kernel. What would probably be easier would be to modify two of the nvidia driver files that DKMS compiles by adding the define for I2C_M_DMA_SAFE. Here's a link that to a bug report that I just posted on developer.nvidia.com and that contains (most) of the relevant line numbers. grepping the nvidia code will find you the rest.
Please note i'm using the legacy nvidia 340.108 drivers, these are discontinued by nvidia. I don't know if still supported legacy/current drivers are affected by this issue.
I don't know either how to or if possible at all to define I2C_M_DMA_SAFE. This is the nvidia binary kernel module DKMS source i'm using https://packages.debian.org/sid/nvidia-legacy-340xx-kernel-dkms. If you could help me to add the define for I2C_M_DMA_SAFE i could test and then if it works, more users will benefit by this knowlege.
Thanks.
The workaround for the Nvidia bug in included in release 1.4.1.