Errors resulting in black screen
Hi, trying to run latest docker image on remote server (from ssh console) with 2 NVIDIA GPUs.
When I execute ./run.sh I see the following output with errors (seems like mostly permission errors)
Stopping "xubuntu" container...
Removing "xubuntu" container...
Creating "xubuntu" container...
Done!
sha1 Fingerprint=9C:D9:6E:04:C2:DA:39:E6:2E:73:7A:76:69:B1:BC:10:25:9D:3D:9E
sha256 Fingerprint=60:89:B2:24:64:E2:CA:34:BE:45:E5:13:76:91:37:ED:49:0E:B7:3B:F8:91:C6:C1:2A:47:8B:93:11:C0:AE:08
[20230613-19:45:44] [INFO ] starting xrdp with pid 69
[20230613-19:45:44] [INFO ] starting xrdp-sesman with pid 68
Failed to set receive buffer size for device monitor, ignoring: Operation not permitted
[20230613-19:45:44] [INFO ] address [0.0.0.0] port [3389] mode 1
[20230613-19:45:44] [INFO ] listening to port 3389 on 0.0.0.0
[20230613-19:45:44] [INFO ] xrdp_listen_pp done
kmalloc-8(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-64(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-128(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [78] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [77] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [76] did not accept message, killing the worker: Operation not permitted
Worker [76] terminated by signal 9 (KILL)
Worker [78] terminated by signal 9 (KILL)
Worker [77] terminated by signal 9 (KILL)
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
LNXSYSTM:00: Failed to write 'change' to '/sys/devices/LNXSYSTM:00/uevent': Read-only file system
Then when I connect with remmina client I see log screen, but after logon I see blank screen.
Could you please suggest where to look? Thanks,
I've tried oldest available image (v93) (with run.sh from v93 tag) and
- some errors after container startup disappeared, but not all
SHA1 Fingerprint=84:18:A8:5D:10:5F:A8:F2:91:17:09:87:08:21:B4:68:9E:BC:F5:37
SHA256 Fingerprint=6B:D8:3D:16:A7:0C:6E:B9:89:ED:E8:41:14:23:8D:24:50:D1:15:A8:AE:D3:B4:A2:7E:E5:98:0C:18:1F:AC:87
[20230613-20:23:27] [INFO ] starting xrdp-sesman with pid 73
[20230613-20:23:27] [INFO ] starting xrdp with pid 74
[20230613-20:23:27] [INFO ] address [0.0.0.0] port [3389] mode 1
[20230613-20:23:27] [INFO ] listening to port 3389 on 0.0.0.0
[20230613-20:23:27] [INFO ] xrdp_listen_pp done
kmalloc-8(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-128(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Worker [81] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Worker [82] did not accept message, killing the worker: Operation not permitted
Worker [81] terminated by signal 9 (KILL)
Worker [82] terminated by signal 9 (KILL)
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
- I can connect to desktop, but VirtualGL doesn't recognize GPU. From what I briefly see, your approach to GPU is more general than NVIDIA Container Toolkit, e.g. you manually install drivers and connect devices. I wonder if run.sh support NVIDIA out-of-the-box or I should modify it (e.g. add --gpus all, etc.)
BTW I hope I will succeed in make it work, because I've read previous "success story" for NVIDIA GPU (vide https://github.com/hectorm/docker-xubuntu/issues/7#issuecomment-841833211). Unfortunately that was before v93 tag, but I guess I can build the image by myself. Still, hope for some valuable insight
I went back to v69, added parameters required by Nvidia Container Toolkit to run.sh and it works! Now I'll try to narrow down what works and what causes troubles
Thanks for your testing, unfortunately I don't have an NVIDIA card anymore and every time I need to do some testing I have to rent a server.
If you find the cause I would appreciate it if you could update the thread, otherwise I will try to find time later to do my own tests.