primus_vk icon indicating copy to clipboard operation
primus_vk copied to clipboard

Run Nvidia Vulkan headless without Xorg!

Open boberfly opened this issue 5 years ago • 17 comments

Hi all,

I was looking around and discovered something kind of amazing and can potentially resolve the need for the nv_vulkan_wrapper entirely. This is what I did:

  1. Make a file for testing in $HOME/.local/share/vulkan/icd.d/nvidiaegl_icd.json
  2. Fill it with this, which is a copy of nvidia's standard icd only it is pointing to another library (I am using 415.27 of nvidia's driver): { "file_format_version" : "1.0.0", "ICD": { "library_path": "libEGL_nvidia.so.0", "api_version" : "1.1.84" } }
  3. Copy primus_vk.json to $HOME/.local/share/vulkan/implicit_layer.d/ like normal that is pointing to the actual primus_vk.so and NOT the nv_vulkan_wrapper at all.
  4. Run a Vulkan app with just enabling the ENABLE_PRIMUS_LAYER=1 your_vulkan_app

I'm doing this with The-Forge and the log file now tells me it is using my Nvidia card instead of my AMD card. I am not using an optimus laptop here also, so this seems to work on a dual-gpu setup in a regular PC. My display is going through a Vega Frontier Edition but it is being rendered on a Quadro K2000 (I will later test on an RTX 2070 and see if it works).

I have noticed some crazy stutter though, so there might be something else which needs to be fixed here.

boberfly avatar Feb 27 '19 07:02 boberfly

Quite interesting, I'll surely will test this approach on this weekend. Probably will need to modify bumblebee to not start Xorg, or create some script to enable discrete video card and load nvidia modules to test.

Edit: optirun already have option --no-xorg :).

leonmaxx avatar Feb 27 '19 11:02 leonmaxx

I have noticed some crazy stutter though

Do your test app have V-Sync enabled?

leonmaxx avatar Feb 27 '19 11:02 leonmaxx

@leonmaxx possibly, V-Sync might be the issue here you're right. Maybe I need to apply that modesetting flag too I'll try that when I get the chance, cheers for the tip!

As for the bumblebee question, this is probably right but I can't use it here as I get some error, but I am not worried about power usage in my workstation so no big deal... :)

boberfly avatar Feb 27 '19 17:02 boberfly

One thing I noticed is that vulkaninfo will display that the device does exist, but it segfaults when looking for available outputs, which makes sense as we are loading a vulkan instance from EGL in an environment which it is not expecting (perhaps that modesetting flag nvidia-drm.modeset=1 is the key to enabling this?)

boberfly avatar Feb 27 '19 18:02 boberfly

@boberfly If you use kernel 4.17+, you can use in-kernel pci-e power management. All you need is just to unload nvidia kernel modules and set /sys/bus/pci/*device*/power/control to auto, and kernel will put your card to suspend state. It'll wake-up automatically when nvidia kernel modules is loaded.

leonmaxx avatar Feb 27 '19 18:02 leonmaxx

If interested I use this udev rule to apply pm automatically on boot nvrtpm.rules:

ACTION=="add|change", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", ATTR{power/control}="auto"
ACTION=="add|change", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/control}="auto"

leonmaxx avatar Feb 27 '19 18:02 leonmaxx

Sadly anything past kernel 4.15 absolutely does not work on my machine, the amdgpu driver doesn't seem to work anymore in 4.17 (it did before) and past 4.18 onwards there is something weird with my hardware topology which does a kernel panic on startup, something I need to bug up about but haven't had the chance or patience yet...

boberfly avatar Feb 27 '19 18:02 boberfly

I can confirm it's working! I just launched WoT with suggested changes to .json and ENABLE_PRIMUS_LAYER=1 optirun --no-xorg wine WorldOfTanks.exe and it works!

leonmaxx avatar Feb 27 '19 20:02 leonmaxx

If anyone wants to test I added primus-vk-headless package to my copr repository which works without Xorg.

leonmaxx avatar Feb 27 '19 20:02 leonmaxx

Good to hear @leonmaxx is the performance decent, and do you have nvidia-drm.modeset=1 set with vsync on/off?

boberfly avatar Feb 27 '19 21:02 boberfly

I have a patch set which will come soon which allows 2 environment variables to set which GPU does what via vendorID:deviceID hex numbers, at least in my case I need to do this with 2 discrete GPUs.

boberfly avatar Feb 27 '19 21:02 boberfly

Performance is good, I have stable 60 FPS with V-Sync on. Mouse latency seems to be better.
I do not have nvidia-drm.modeset=1 set. I use only nvidia kernel module, other modules nvidia-drm and nvidia-modeset is disabled using alias ... off.
In my notebook I have GeForce 1050 without hardware outputs (PCI device class 0x302 - 3D accelerator), possibly that is why it works without modeset.

leonmaxx avatar Feb 27 '19 21:02 leonmaxx

And I didn't noticed any stuttering.

leonmaxx avatar Feb 27 '19 21:02 leonmaxx

@leonmaxx hey I got vsync working, it has fixed the hitching but it is very very slow on The-Forge unfortunately

Also I just made a PR which allows to set env vars on what GPU to use for display and rendering if you wanted to test it, but I guess you don't need it on optimus laptops...

boberfly avatar Mar 01 '19 07:03 boberfly

This seems to be working like a charm - in fact it is the only way for me to get wine/dxvk up & running on an (Optimus) Nvidia GTX 860M using proprietary drivers v418.56 on Debian. If I can contribute with some test results please let me know.

wirr00 avatar May 10 '19 09:05 wirr00

@wirr00 I also have the proprietary drivers v418.56 and am using primus_vk as described in the README (with wrapper and libGL.so)

Generally to the idea to always use the vulkan driver from libEGL.so: I have no idea what's the real difference between the different vulkan drivers shpped by nvidia. On my system I can ICDs in: libEGL_nvidia.so.0 libGL.so.1 libGLX_nvidia.so.0. I cannot tell, what's the difference between those versions and which version has what advantages. I'd like to stick to libGL as this seems to be the vulkan ICD that is installed "normally". However libGL seems to misbehave in terms that is requires the secondary X-Server from bumblebee and libEGL_nvidia does not. Does anyone of you know any documentation/explanation what these libraries are supposed to be for and what their differences are?

felixdoerre avatar May 10 '19 13:05 felixdoerre

Hi @felixdoerre Not sure about documentation, but my assumption is that Linux distros will eventually default to Wayland, and GLX for context-creation was always tied to the Xserver, and on Wayland you would use EGL for context-creation, so this is probably why Nvidia ships two libraries for either situation. I think the end goal is to use Xwayland for legacy GLX contexts once Wayland takes over the Xserver as the default, but Nvidia will ship the GL/GLX lib for years to come while stable distros like RHEL/CentOS use X as default...

boberfly avatar May 13 '19 18:05 boberfly