coreos-nvidia icon indicating copy to clipboard operation
coreos-nvidia copied to clipboard

Final installation path instructions and tips

Open dashesy opened this issue 7 years ago • 4 comments

on CoreOS, /lib64/, /usr/lib64/ and co. all reside on a read-only filesystem. You might need to create a new directory elsewhere and its location listed in a file under /etc/ld.so.conf.d/

  1. There is no /etc/ld.so.conf.d in 1298.5.0, only /etc/ld.so.conf is a symlink to /usr/lib/ld.so.conf which is mounted in readonly filesystem

There was a similar problem was with .bashrc ! the solution is to remove the link and replace it with the actual file) I created a /opt/bin and /opt/lib64 but ldconfig complained about libEGL.so.1 which I made it s symlink

sudo ln -sfT libEGL.so.375.20 /opt/lib64/libEGL.so.1
  1. What about the kernel modules?
    sudo mkdir -p /opt/lib64/modules/4.9.9-coreos-r1/kernel

But modprobe will not be able to find them, because it is non-standard path (insmod should work). I wonder if it is simpler to re-mount the drive as rw and copy the kernel modules and all, rather than trying to install in /opt and work around issues.

creating device nodes under /dev/

Do you mean after modprobe or perhaps insmod?



From [here](https://groups.google.com/forum/#!topic/coreos-user/s9Sy_gBl94o):
> Kernel modules can be installed under `/opt/lib/modules/$(uname -r)/` as long as 
you use `--dirname /opt` when calling modprobe. 

So now I only need to use udev, otherwise will have to resort to [manually adding the nodes](https://gist.githubusercontent.com/tleyden/74f593a0beea300de08c/raw/95ed93c5751a989e58153db6f88c35515b7af120/nvidia_devices.sh) (,[same](http://askubuntu.com/a/748905/103961))

dashesy avatar Mar 07 '17 01:03 dashesy

You can create /etc/ld.so.conf.d. It's not there by default and /etc is R/W.

I've never seen problems with libEGL.

The kernel modules could go in a variety of places: /etc/modules, /opt/lib/modules or even a R/W overlay over /lib/modules. I use insmod directly. And yes, device nodes should be created after the modules are loaded.

I'll try to add more details. I didn't add them before, because different people have different needs and toolchains.

therc avatar Mar 07 '17 16:03 therc

I tried to emulate Ubuntu in loading nvidia

  1. created a udev rules file:

    $ cat /etc/udev/rules.d/71-nvidia.rules

# Tag the device as master-of-seat so that logind is happy
# (see LP: #1365336)
SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"

# Start and stop nvidia-persistenced on power on and power off
# respectively
ACTION=="add" DEVPATH=="/bus/acpi/drivers/NVIDIA ACPI Video Driver" SUBSYSTEM=="drivers" RUN+="/bin/systemctl start --no-block nvidia-persistenced.service"
ACTION=="remove" DEVPATH=="/bus/acpi/drivers/NVIDIA ACPI Video Driver" SUBSYSTEM=="drivers" RUN+="/bin/systemctl stop --no-block nvidia-persistenced"

# Start and stop nvidia-persistenced when loading and unloading
# the driver
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/bin/systemctl start --no-block nvidia-persistenced.service"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/bin/systemctl stop --no-block nvidia-persistenced"

# Load and unload nvidia-modeset module
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/insmod /opt/lib/modules/4.9.9-coreos-r1/video/nvidia-modeset.ko"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/rmmod -r nvidia-modeset"

# Load and unload nvidia-drm module
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/insmod /opt/lib/modules/4.9.9-coreos-r1/video/nvidia-drm.ko"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/rmmod nvidia-drm"

# Load and unload nvidia-uvm module
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/insmod /opt/lib/modules/4.9.9-coreos-r1/video/nvidia-uvm.ko"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/sbin/rmmod -r nvidia-uvm"

# This will create the device nvidia device nodes
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/opt/bin/nvidia-smi"

# Create the device node for the nvidia-uvm module
ACTION=="add" DEVPATH=="/module/nvidia_uvm" SUBSYSTEM=="module" RUN+="/opt/bin/create-uvm-dev-node.sh"

  1. With this extra script in /opt/bin:

    $ cat cat /opt/bin/create-uvm-dev-node.sh

#!/bin/sh

# Get the major device number for nvidia-uvm and create the node
major=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
if [ -n "$major" ]; then
    mknod -m 666 /dev/nvidia-uvm c $major 0
fi
  1. Created the user:

    useradd --system --home '/' --shell '/sbin/nologin' -c 'NVIDIA Persistence Daemon' nvidia-persistenced

Now I can just insmod the nvidia.ko and it loads everything needed:

sudo insmod /opt/lib64/modules/4.9.9-coreos-r1/video/nvidia.ko

I am thinking of adding another rule to do that, will update if I could get it working. For now I have another service that I run that does the required insmod:

$ cat nvidia-start.service

[Unit]
Description=Load NVIDIA module

[Service]
ExecStart=/usr/sbin/insmod /opt/lib/modules/4.9.9-coreos-r1/video/nvidia.ko

[Install]
WantedBy=multi-user.target

dashesy avatar Mar 08 '17 00:03 dashesy

I am using a CoreOS system and want to install nvidia-docker on it. The prerequisite to which is having the nvidia drivers. I have run build.sh and have the three archives. I am a total noob here, could you please help me on how to proceed to set it up? I have followed the steps that @dashesy has mentioned in the last comment. So right now, my /opt/bin contains all the .so files from the libraries directory and files like nvidia-smi, nvidia-settings from tools directory. The /opt/lib/modules/4.9.16-coreos-r1/video contains the .ko files. But running the nvidia-smi command gives the following output

NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Please help. Thanks in advance.

DevipriyaSarkar avatar Jun 07 '17 09:06 DevipriyaSarkar

@DevipriyaSarkar did you try the PR #4 (and its nvidia_install and nvidia_docker_install) ? Also, you need to make sure nvidia-uvm is not loaded when you call build.sh or otherwise the install scripts bails out with unfinished kernel modules (disable nvidia-start, reboot and call build.sh again if that is the case). Also, if you already had nvidia-docker installed before running this, you may need to rebuild the volume to get the hardlinks in the volume (you can inspect what the volume contains in /var/lib/nvidia-docker/volumes).

dashesy avatar Jun 07 '17 15:06 dashesy