docker-xubuntu icon indicating copy to clipboard operation
docker-xubuntu copied to clipboard

Errors resulting in black screen

Open soldierofhell opened this issue 2 years ago • 4 comments

Hi, trying to run latest docker image on remote server (from ssh console) with 2 NVIDIA GPUs.

When I execute ./run.sh I see the following output with errors (seems like mostly permission errors)

Stopping "xubuntu" container...
Removing "xubuntu" container...
Creating "xubuntu" container...
Done!

sha1 Fingerprint=9C:D9:6E:04:C2:DA:39:E6:2E:73:7A:76:69:B1:BC:10:25:9D:3D:9E
sha256 Fingerprint=60:89:B2:24:64:E2:CA:34:BE:45:E5:13:76:91:37:ED:49:0E:B7:3B:F8:91:C6:C1:2A:47:8B:93:11:C0:AE:08
[20230613-19:45:44] [INFO ] starting xrdp with pid 69
[20230613-19:45:44] [INFO ] starting xrdp-sesman with pid 68
Failed to set receive buffer size for device monitor, ignoring: Operation not permitted
[20230613-19:45:44] [INFO ] address [0.0.0.0] port [3389] mode 1
[20230613-19:45:44] [INFO ] listening to port 3389 on 0.0.0.0
[20230613-19:45:44] [INFO ] xrdp_listen_pp done
kmalloc-8(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-64(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-128(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [78] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [77] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Worker [76] did not accept message, killing the worker: Operation not permitted
Worker [76] terminated by signal 9 (KILL)
Worker [78] terminated by signal 9 (KILL)
Worker [77] terminated by signal 9 (KILL)
kmalloc-rcl-96(122704:72a0868a96eb001d1991def099629933bcb4f9af3b382e8981d4a8672c1e010c): Failed to send device, ignoring: Operation not permitted
LNXSYSTM:00: Failed to write 'change' to '/sys/devices/LNXSYSTM:00/uevent': Read-only file system

Then when I connect with remmina client I see log screen, but after logon I see blank screen.

Could you please suggest where to look? Thanks,

soldierofhell avatar Jun 13 '23 20:06 soldierofhell

I've tried oldest available image (v93) (with run.sh from v93 tag) and

  1. some errors after container startup disappeared, but not all
SHA1 Fingerprint=84:18:A8:5D:10:5F:A8:F2:91:17:09:87:08:21:B4:68:9E:BC:F5:37
SHA256 Fingerprint=6B:D8:3D:16:A7:0C:6E:B9:89:ED:E8:41:14:23:8D:24:50:D1:15:A8:AE:D3:B4:A2:7E:E5:98:0C:18:1F:AC:87
[20230613-20:23:27] [INFO ] starting xrdp-sesman with pid 73
[20230613-20:23:27] [INFO ] starting xrdp with pid 74
[20230613-20:23:27] [INFO ] address [0.0.0.0] port [3389] mode 1
[20230613-20:23:27] [INFO ] listening to port 3389 on 0.0.0.0
[20230613-20:23:27] [INFO ] xrdp_listen_pp done
kmalloc-8(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-128(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Worker [81] did not accept message, killing the worker: Operation not permitted
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Worker [82] did not accept message, killing the worker: Operation not permitted
Worker [81] terminated by signal 9 (KILL)
Worker [82] terminated by signal 9 (KILL)
kmalloc-rcl-96(122801:b3c08c1f0f01fcb54c35fc900eef232a1cb52697d003cef8b934544eb56facb1): Failed to send device, ignoring: Operation not permitted
  1. I can connect to desktop, but VirtualGL doesn't recognize GPU. From what I briefly see, your approach to GPU is more general than NVIDIA Container Toolkit, e.g. you manually install drivers and connect devices. I wonder if run.sh support NVIDIA out-of-the-box or I should modify it (e.g. add --gpus all, etc.)

soldierofhell avatar Jun 13 '23 21:06 soldierofhell

BTW I hope I will succeed in make it work, because I've read previous "success story" for NVIDIA GPU (vide https://github.com/hectorm/docker-xubuntu/issues/7#issuecomment-841833211). Unfortunately that was before v93 tag, but I guess I can build the image by myself. Still, hope for some valuable insight

soldierofhell avatar Jun 13 '23 21:06 soldierofhell

I went back to v69, added parameters required by Nvidia Container Toolkit to run.sh and it works! Now I'll try to narrow down what works and what causes troubles

soldierofhell avatar Jun 14 '23 10:06 soldierofhell

Thanks for your testing, unfortunately I don't have an NVIDIA card anymore and every time I need to do some testing I have to rent a server.

If you find the cause I would appreciate it if you could update the thread, otherwise I will try to find time later to do my own tests.

hectorm avatar Jun 18 '23 20:06 hectorm