tar: *.ko: Cannot stat: No such file or directory
$./build.sh 367.27 alpha 1097.0.0
I am following your instructions, but during the ./build.sh step I get the following message:
++ basename -a 'pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel/*.ko'
-
tar -C pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel -cvj '*.ko' tar: *.ko: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors
Did I miss something?
I get something similar:
coreos coreos-nvidia # ./build.sh 275.09.07 stable 1185.5.0
Downloading CoreOS stable developer image 1185.5.0
Decompressing
Downloading NVIDIA Linux drivers version 275.09.07
/home/core/coreos-nvidia/pkg/run_files/1185.5.0 /home/core/coreos-nvidia
Creating directory NVIDIA-Linux-x86_64-275.09.07
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 275.09.07.............................................................................................................................................
/home/core/coreos-nvidia
Spawning container coreos_developer_container.bin.1185.5.0 on /home/core/coreos-nvidia/coreos_developer_container.bin.1185.5.0.
Press ^] three times within 1s to kill container.
+ VERSION=275.09.07
+ echo Building 275.09.07
Building 275.09.07
+ emerge-gitclone
>>> Cloning repository 'portage-stable' from 'https://github.com/coreos/portage-stable.git'...
>>> Starting git clone in /var/lib/portage/portage-stable
Cloning into '/var/lib/portage/portage-stable'...
remote: Counting objects: 65997, done.
remote: Compressing objects: 100% (69/69), done.
remote: Total 65997 (delta 31), reused 0 (delta 0), pack-reused 65927
Receiving objects: 100% (65997/65997), 41.55 MiB | 9.87 MiB/s, done.
Resolving deltas: 100% (27864/27864), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/portage-stable successful
>>> Cloning repository 'coreos' from 'https://github.com/coreos/coreos-overlay.git'...
>>> Starting git clone in /var/lib/portage/coreos-overlay
Cloning into '/var/lib/portage/coreos-overlay'...
remote: Counting objects: 31678, done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 31678 (delta 4), reused 0 (delta 0), pack-reused 31649
Receiving objects: 100% (31678/31678), 10.77 MiB | 7.41 MiB/s, done.
Resolving deltas: 100% (15268/15268), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/coreos-overlay successful
Container coreos_developer_container.bin.1185.5.0 terminated by signal KILL.
+ ARTIFACT_DIR=pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07
+ VERSION=275.09.07
+ COMBINED_VERSION=1185.5.0-275.09.07
+ TOOLS='nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server
nvidia-persistenced nvidia-settings'
++ basename -a pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libGL.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libOpenCL.so.1.0.0 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libXvMCNVIDIA.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libcuda.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libglx.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvcuvid.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-cfg.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-compiler.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-glcore.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-ml.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-tls.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-wfb.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_nvidia.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj libGL.so.275.09.07 libOpenCL.so.1.0.0 libXvMCNVIDIA.so.275.09.07 libcuda.so.275.09.07 libglx.so.275.09.07 libnvcuvid.so.275.09.07 libnvidia-cfg.so.275.09.07 libnvidia-compiler.so.275.09.07 libnvidia-glcore.so.275.09.07 libnvidia-ml.so.275.09.07 libnvidia-tls.so.275.09.07 libnvidia-wfb.so.275.09.07 libvdpau.so.275.09.07 libvdpau_nvidia.so.275.09.07 libvdpau_trace.so.275.09.07
libGL.so.275.09.07
libOpenCL.so.1.0.0
libXvMCNVIDIA.so.275.09.07
libcuda.so.275.09.07
libglx.so.275.09.07
libnvcuvid.so.275.09.07
libnvidia-cfg.so.275.09.07
libnvidia-compiler.so.275.09.07
libnvidia-glcore.so.275.09.07
libnvidia-ml.so.275.09.07
libnvidia-tls.so.275.09.07
libnvidia-wfb.so.275.09.07
libvdpau.so.275.09.07
libvdpau_nvidia.so.275.09.07
libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server nvidia-persistenced nvidia-settings
tar: nvidia-debugdump: Cannot stat: No such file or directory
tar: nvidia-cuda-mps-control: Cannot stat: No such file or directory
nvidia-xconfig
tar: nvidia-modprobe: Cannot stat: No such file or directory
nvidia-smi
tar: nvidia-cuda-mps-server: Cannot stat: No such file or directory
tar: nvidia-persistenced: Cannot stat: No such file or directory
nvidia-settings
tar: Exiting with failure status due to previous errors
++ basename -a 'pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel/*.ko'
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel -cvj '*.ko'
tar: *.ko: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Looks like we're exiting the container after emerge-gitclone in the /build.sh script.
Sorry, I missed this. I now have a filter to highlight these issues.
Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".
Another thing worth checking... are you running low on disk?
One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.
When I last debugged this, I traced it to the main event loop in nspawn exiting in an unexpected fashion (the KILL signal came from nspawn itself). Upgrading systemd made the problem go away, so I didn't investigate further.
Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".
Version 231, running under 4.7.3-coreos-r2:
# systemctl --version
systemd 231
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN
I do agree that this issue is a bug in systemd, and not your scripts.
Another thing worth checking... are you running low on disk?
No, I went to the trouble of expanding the image first with dd, then gdisk and e2size.
One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.
Good idea!
Speaking of -coreos-r2... I found and fixed a silly bug that affected 1185.2.0 and up. Can you retry with the latest commit?
I think I found the culprit. Running systemd-nspawn with --share-system increases the chances of a SIGKILL. Please try the latest version, I have just pushed a number of fixes. With the latest CoreOS security fix, you might need to add the --emerge-sources flag, since CoreOS engineers seem to not have uploaded portage binary packages for the those versions. To compensate for the larger of packages built now, the scripts build 4 of them at a time.
I tried the latest version, got the same error:
$ ./build.sh 367.57 stable 1298.5.0
tar: *.ko: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors
# systemctl --version
systemd 231
1298.5.0 is never going to work with 367.57.
Newer versions of the driver (375, etc.) support Linux 4.9.9 (which is what the latest CoreOS uses), but 367.57 dates back to last October and does not support it (get_user_pages() has a different signature).
Why are you using it instead of a more recent version? Is it because you have old GRID cards and Nvidia docs tell you to use v367, as that's the last version to support them? E.g. from http://us.download.nvidia.com/XFree86/Linux-x86_64/378.13/README/supportedchips.html
"Below are the legacy GPUs that are no longer supported in the unified driver. These GPUs will continue to be maintained through the special legacy NVIDIA GPU driver releases."
If that's the case, you need to ask Nvidia to make one of the special legacy releases they have promised. An example here:
https://devtalk.nvidia.com/default/topic/997603/linux/newer-367-driver-for-grid-k520-/
Yes, also looked at the logs and it was "error: too many arguments to function 'get_user_pages'"
No particular reason! Will try v375 now, thanks.
$ ./build.sh 375.20 stable 1298.5.0
succeeded. Thanks.
I had another problem when I used --keep and ran ./build twice emerge complained about not enough space and did not build the modules. With a fresh run it worked. I did not try to reproduce it.
Otherwise this issue is resolved.
apt install systemd-container