jailmaker icon indicating copy to clipboard operation
jailmaker copied to clipboard

AMD GPU passthrough support

Open gardenali opened this issue 10 months ago • 15 comments

It's great to have intel and nvidia support, but I'm missing the AMD option.

Thank you!

gardenali avatar Mar 30 '24 12:03 gardenali

I am currently also at the point that I need amd gpu support to move my last service to jailmaker. Can someone explain what is needed to enable gpu support or what you have done for intel and nvidia? I am open to implementing and evaluating the feature on my system.

lks-hrsch avatar Mar 30 '24 17:03 lks-hrsch

Does it work if you manually add --bind=/dev/dri ?

e.g : jlmkr create myjail --bind=/dev/dri

easyfab avatar Mar 30 '24 21:03 easyfab

The intel gpu setting does just that

jeefberkey avatar Mar 31 '24 03:03 jeefberkey

Jailmaker has intel and nvidia GPU support because these drivers are provided by the TrueNAS SCALE host OS. I think adding support for a dedicated AMD GPU in jailmaker is not trivial, if possible at all without modifying the host OS. Since I have no dedicated GPU in my TrueNAS server I can't investigate this. Feel free to investigate though. @lks-hrsch you could have a look at the python code of jlmkr.py to see what the intel and nvidia GPU passthrough options do.

Jip-Hop avatar Mar 31 '24 14:03 Jip-Hop

Isn't AMD support in Truenas Host OS ?

lspci -k | grep amdgpu Kernel driver in use: amdgpu Kernel modules: amdgpu

For info, I tried with 5700G APU, adding --bind=/dev/dri seems to give me access to the igpu in jailmaker. Don't know if it work with dgpu.

edit : to complete

root@myjail:~# vainfo error: can't connect to X server! libva info: VA-API version 1.17.0 libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so libva info: Found init function __vaDriverInit_1_17 libva info: va_openDriver() returns 0 vainfo: VA-API version: 1.17 (libva 2.12.0) vainfo: Driver version: Mesa Gallium driver 22.3.6 for AMD Radeon Graphics (renoir, LLVM 15.0.6, DRM 3.54, 6.6.16-production+truenas) vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSlice VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSlice VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSlice VAProfileHEVCMain10 : VAEntrypointVLD VAProfileHEVCMain10 : VAEntrypointEncSlice VAProfileJPEGBaseline : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointVLD VAProfileVP9Profile2 : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc

image

easyfab avatar Mar 31 '24 15:03 easyfab

Yes I think there's a difference between AMD iGPU and dGPU but I'd be happy to be proven wrong.

Jip-Hop avatar Mar 31 '24 15:03 Jip-Hop

I've labeled this issue as invalid and help wanted. I think iGPU is already supported (Intel or AMD). You'd have to set gpu_passthrough_intel=1 in your config file for that. I realize now this naming is confusing in this case...

Regarding AMD dedicated GPUs, as far as I know those aren't supported on the SCALE host system and therefore jailmaker can't support them either.

Since I don't have an AMD GPU I could use help to confirm this issue is indeed invalid. Either way, I won't be working on a solution for this issue and I recommend to either switch to an nvidia GPU or implement a solution (if possible) and provide a pull request.

Jip-Hop avatar Apr 06 '24 08:04 Jip-Hop

I apologize for not getting back to you sooner, but I can prove that for AMD iGPU it's already working, a dGPU I also currently don't have for testing.

lks-hrsch avatar Apr 20 '24 13:04 lks-hrsch

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like ./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd works. After this, I just had a bit of problems with permissions on these files. However, now I can run llama3 in a jail using an AMD GPU.

maeehart avatar Apr 22 '24 12:04 maeehart

@maeehart is that a dedicated AMD GPU you're using (which model)? If so then that's good news and we can close this ticket as completed.

Jip-Hop avatar Apr 22 '24 13:04 Jip-Hop

Yes, it is a 6900 xt, i.e., a dedicated AMD GPU. We could still make a PR regarding the AMD support so that one could just ues the GPU by adding a gpu_passthrough_amd flag.

maeehart avatar Apr 23 '24 03:04 maeehart

Ah yes that's a good idea. Could you provide the PR?

Jip-Hop avatar Apr 23 '24 05:04 Jip-Hop

I can do it during the weekend. I will need to see if I can do something about the permissions.

maeehart avatar Apr 23 '24 11:04 maeehart

Though you didn't specify what your permission issues are; is it fixed if you add: --property=DeviceAllow="/dev/kfd rw" ?

I ask, since I do something similar for CoralTPU passthrough, which looks like:

--bind='/dev/ttyUSB0'
--property=DeviceAllow="/dev/ttyUSB0 rwm"
--property=DeviceAllow="char-drm rwm"
--property=DeviceAllow=/dev/bus/usb

Though I haven't spent enough time in the land of nspawn to really work out if all of these are necessary/correct lol

Edit: This made me want to go look up what "rwm" is vs just "rw", and the "m" means:

"m" (Mknod): Allows the creation of device nodes using mknod. Device nodes are special files in Unix-like operating systems that represent device interfaces. With this permission, the container can create new device nodes within its filesystem, enabling access to devices that were not initially available. This is useful for dynamically creating device nodes as needed by containerized applications.

dalgibbard avatar May 09 '24 18:05 dalgibbard

Please have a look in the jlmkr.py code and search for for DeviceAllow. I think adding this explicitly will actually cause issues instead of solving them.

Jip-Hop avatar May 09 '24 18:05 Jip-Hop

Any updates on the AMD dGPU support?

jere-co avatar Jun 17 '24 16:06 jere-co

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like

@maeehart are you sure it was the AMD GPU being used (and not the one in the CPU because you also added --bind=/dev/dri)? I assume the AMD GPU should be usable without --bind=/dev/dri, at least this is the case for an NVIDIA GPU. Which commands did you run in the jail to test the AMD GPU?

I have an AMD RX 580 GPU in a test TrueNAS server but couldn't yet get it working in an ubuntu jail. I tried debugging with mpv --hwdec=auto video_filename from this arch resource.

Jip-Hop avatar Jun 25 '24 07:06 Jip-Hop

Hey! I am sure that it is the AMD GPU. I have been now running ollama in the jail and confirming that the GPU is running with watching rocm-smi command for some time. However, I have not had the time to do it again so that I could add the proper scrip to this repo and I am sorry about that. I remember that I had to bind both /dev/dri and /dev/fkd and then modify their rights to allow writing to these files (chmod ...).

maeehart avatar Jun 25 '24 08:06 maeehart

Instead of messing with permissions of /dev/kfd can't you run the process in your jail under the same user/group which already owns /dev/kfd?

Jip-Hop avatar Jun 25 '24 11:06 Jip-Hop

I think that that is a much better idea.

maeehart avatar Jun 25 '24 12:06 maeehart

Reportedly AMD GPU passthrough works:

 ./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd

Adding a dedicated AMD GPU passthrough config option, with corresponding flag for the create command, seems like overkill when a single additional bind mount is enough (especially since AMD GPU passthrough reportedly relies on /dev/dri being mounted which gpu_passthrough_intel=1 takes care of).

Jip-Hop avatar Jul 09 '24 02:07 Jip-Hop

Suggest documenting this more widely, like in primary readme, or revisiting the decision not to include a flag (even if it's simple). I started investigating jailmaker to handle my k3s -> Docker conversion for Truenas scale, and it wasn't clear if AMD GPU passthrough were supported at all until finding this ticket and reading comments.

tyvsmith avatar Jul 14 '24 09:07 tyvsmith

Updated the readme!

Jip-Hop avatar Jul 15 '24 12:07 Jip-Hop

Can confirm hardware transcoding for AMD 5700G works.

krupinskika avatar Oct 29 '24 11:10 krupinskika