xserver icon indicating copy to clipboard operation
xserver copied to clipboard

network transparent Vulkan

Open metux opened this issue 5 months ago • 7 comments

Describe the feature

Bridge Vulkan API via through the X11 protocol (network transparent)

  • client side: dedicated libvulkan implementation that's bridging calls to the Xserver
  • Vulkan API extension for remote instance creation and X11 specific operations (eg present vulkan image into X11 drawable, etc)
  • server side: execute commands on it's local GPU and compose the result into a DRAWABLE or PICTURE.

Note: this doesn't involve DRI at all (DRI is local-only by definition)

It should be implemented because

  • current GLX only implements older OpenGL spec
  • OpenGL is quite implicit, needs a lot of state-tracking and causing lots of preventable traffic
  • Vulkan is more and more replacing OpenGL, not just for graphics but also compute workloads (eg. DL/AI)
  • Vulkan is very explicit and thus only minimal state-tracking required
  • directly implementing the Vulkan-API allows easily switching implementations and using existing (even proprietary) drivers as (server-side) backend/render-target

What are the alternatives?

  • finish up GLX for recent OpenGL spec --> only helpful for clients still on OpenGL
  • hook up Mesa's gallium and bridge gallium API --> only internal to Mesa, no public spec, not even exported

Planning

  1. basic infrastructure for running vulkan inside Xserver and little offscreen rendering demo (inside Xserver)
  2. copy render result into pixmaps or pictures and demo creating root window background this way
  3. little x11 extension for driving simple offscreen rendering into pixmaps/pictures
  4. add buffer transfers (eg. from/to pixmaps or pictures)
  5. .... add the missing stuff step by step ...
  6. finalize spec and write our own libvulkan implementation going via X11.

Compute workloads

For driving pure compute workloads remotely, we can just start virtual Xservers (eg. xvfb) in the datacenter and let compute clients connect to them via standard X11.

If you’ve done six impossible things this morning, why not round it off with breakfast at Milliways ?

metux avatar Sep 22 '25 08:09 metux

We have to be very careful not to add overhead for local glx clients.

According to upstream, glx uses dri in Xorg and egl in Xwayland: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1638 Maybe we should first port glx to use egl instead of dri first?

Also, as I understand it, indirect glx is required for glx to work remotely.

As thing are right now, indirect glx is very untested and, at least on my machine, entirely broken, to the point that creating indirect gl contexts causes the X server to segfault.

This particular failure is not due to any deficiency in X server code. The problem is that the mesa indirect glx driver does not implement a function to create an indirect context and initializes the function pointer for creating indirect context to NULL. The X server then tries to call this function pointer without any checking (as it should) and segfaults.

There are also other hiccups with indirect glx, that I think we should deal with before attempting network-transparent vulkan.

stefan11111 avatar Sep 25 '25 11:09 stefan11111

We have to be very careful not to add overhead for local glx clients.

It's not about GLX at all. It will be an entirely new extension.

And the client side (at least for start) will be an entirely separate implementation of Vulkan API (client needs to load/bind a different libvulkan.so implementation) ... whether some other project, eg. Mesa some day picking it up and integrating it is their choice.

According to upstream, glx uses dri in Xorg and egl in Xwayland: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1638 Maybe we should first port glx to use egl instead of dri first?

Aehm, well, there're different ones: X11 GLX extension (some call it iGLX) is network transparent, no DRI. Just calling down into libGL. NVidia has their own one, no idea what it's actually doing. And then there's DRI, which doesn't need GLX at all, it just about transferring/flipping buffers and some synchronization (that's not the X11 sync extension). Not sure whether Mesa is calling into both when doing GL stuff.

Of course, we can port existing GLX implementation to EGL (like glamor does), but that's a different matter.

As thing are right now, indirect glx is very untested and, at least on my machine, entirely broken, to the point that creating indirect gl contexts causes the X server to segfault.

Feel free to do bug hunting here :)

The problem is that the mesa indirect glx driver does not implement a function to create an indirect context and initializes the function pointer for creating indirect context to NULL. The X server then tries to call this function pointer without any checking (as it should) and segfaults.

Sounds like we should fix Mesa, right ?

metux avatar Sep 25 '25 12:09 metux

If a full network transparent vulkan implementation gets done, does this mean that something like the Zink driver (which layers OpenGL over vulkan and it's... decent) would work for network transparent OpenGL 4.6?

severtheskyline avatar Sep 26 '25 23:09 severtheskyline

It sounds it something along lines as qemu rutabaga ( https://www.qemu.org/docs/master/system/devices/virtio-gpu.html#virtio-gpu-rutabaga ).

cepelinas9000 avatar Sep 27 '25 22:09 cepelinas9000

It sounds it something along lines as qemu rutabaga ( https://www.qemu.org/docs/master/system/devices/virtio-gpu.html#virtio-gpu-rutabaga ).

This link 404s now, here is the new link: https://gitlab.com/qemu-project/qemu/-/blob/master/docs/system/devices/virtio/virtio-gpu.rst

b-aaz avatar Nov 11 '25 10:11 b-aaz

This might be what I'm missing to make desktop apps work out of the box from inside docker. If opengl over vulkan works.

Frontrider avatar Nov 24 '25 19:11 Frontrider

This might be what I'm missing to make desktop apps work out of the box from inside docker. If opengl over vulkan works.

If the container is on the same machine, it already should work - you just have to mount the dri devices.

metux avatar Nov 25 '25 11:11 metux