coreos-nvidia icon indicating copy to clipboard operation
coreos-nvidia copied to clipboard

tar: *.ko: Cannot stat: No such file or directory

Open DroiSystem opened this issue 9 years ago • 13 comments

$./build.sh 367.27 alpha 1097.0.0

I am following your instructions, but during the ./build.sh step I get the following message:

++ basename -a 'pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel/*.ko'

  • tar -C pkg/run_files/1097.0.0/NVIDIA-Linux-x86_64-367.27/kernel -cvj '*.ko' tar: *.ko: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors

Did I miss something?

DroiSystem avatar Sep 29 '16 07:09 DroiSystem

I get something similar:

coreos coreos-nvidia # ./build.sh 275.09.07 stable 1185.5.0 
Downloading CoreOS stable developer image 1185.5.0
Decompressing
Downloading NVIDIA Linux drivers version 275.09.07
/home/core/coreos-nvidia/pkg/run_files/1185.5.0 /home/core/coreos-nvidia
Creating directory NVIDIA-Linux-x86_64-275.09.07
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 275.09.07.............................................................................................................................................
/home/core/coreos-nvidia
Spawning container coreos_developer_container.bin.1185.5.0 on /home/core/coreos-nvidia/coreos_developer_container.bin.1185.5.0.
Press ^] three times within 1s to kill container.
+ VERSION=275.09.07
+ echo Building 275.09.07
Building 275.09.07
+ emerge-gitclone
>>> Cloning repository 'portage-stable' from 'https://github.com/coreos/portage-stable.git'...
>>> Starting git clone in /var/lib/portage/portage-stable
Cloning into '/var/lib/portage/portage-stable'...
remote: Counting objects: 65997, done.
remote: Compressing objects: 100% (69/69), done.
remote: Total 65997 (delta 31), reused 0 (delta 0), pack-reused 65927
Receiving objects: 100% (65997/65997), 41.55 MiB | 9.87 MiB/s, done.
Resolving deltas: 100% (27864/27864), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/portage-stable successful
>>> Cloning repository 'coreos' from 'https://github.com/coreos/coreos-overlay.git'...
>>> Starting git clone in /var/lib/portage/coreos-overlay
Cloning into '/var/lib/portage/coreos-overlay'...
remote: Counting objects: 31678, done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 31678 (delta 4), reused 0 (delta 0), pack-reused 31649
Receiving objects: 100% (31678/31678), 10.77 MiB | 7.41 MiB/s, done.
Resolving deltas: 100% (15268/15268), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/coreos-overlay successful
Container coreos_developer_container.bin.1185.5.0 terminated by signal KILL.
+ ARTIFACT_DIR=pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07
+ VERSION=275.09.07
+ COMBINED_VERSION=1185.5.0-275.09.07
+ TOOLS='nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server
nvidia-persistenced nvidia-settings'
++ basename -a pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libGL.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libOpenCL.so.1.0.0 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libXvMCNVIDIA.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libcuda.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libglx.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvcuvid.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-cfg.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-compiler.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-glcore.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-ml.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-tls.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libnvidia-wfb.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_nvidia.so.275.09.07 pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj libGL.so.275.09.07 libOpenCL.so.1.0.0 libXvMCNVIDIA.so.275.09.07 libcuda.so.275.09.07 libglx.so.275.09.07 libnvcuvid.so.275.09.07 libnvidia-cfg.so.275.09.07 libnvidia-compiler.so.275.09.07 libnvidia-glcore.so.275.09.07 libnvidia-ml.so.275.09.07 libnvidia-tls.so.275.09.07 libnvidia-wfb.so.275.09.07 libvdpau.so.275.09.07 libvdpau_nvidia.so.275.09.07 libvdpau_trace.so.275.09.07
libGL.so.275.09.07
libOpenCL.so.1.0.0
libXvMCNVIDIA.so.275.09.07
libcuda.so.275.09.07
libglx.so.275.09.07
libnvcuvid.so.275.09.07
libnvidia-cfg.so.275.09.07
libnvidia-compiler.so.275.09.07
libnvidia-glcore.so.275.09.07
libnvidia-ml.so.275.09.07
libnvidia-tls.so.275.09.07
libnvidia-wfb.so.275.09.07
libvdpau.so.275.09.07
libvdpau_nvidia.so.275.09.07
libvdpau_trace.so.275.09.07
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07 -cvj nvidia-debugdump nvidia-cuda-mps-control nvidia-xconfig nvidia-modprobe nvidia-smi nvidia-cuda-mps-server nvidia-persistenced nvidia-settings
tar: nvidia-debugdump: Cannot stat: No such file or directory
tar: nvidia-cuda-mps-control: Cannot stat: No such file or directory
nvidia-xconfig
tar: nvidia-modprobe: Cannot stat: No such file or directory
nvidia-smi
tar: nvidia-cuda-mps-server: Cannot stat: No such file or directory
tar: nvidia-persistenced: Cannot stat: No such file or directory
nvidia-settings
tar: Exiting with failure status due to previous errors
++ basename -a 'pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel/*.ko'
+ tar -C pkg/run_files/1185.5.0/NVIDIA-Linux-x86_64-275.09.07/kernel -cvj '*.ko'
tar: *.ko: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

jpap avatar Dec 10 '16 10:12 jpap

Looks like we're exiting the container after emerge-gitclone in the /build.sh script.

jpap avatar Dec 10 '16 10:12 jpap

Sorry, I missed this. I now have a filter to highlight these issues.

Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".

Another thing worth checking... are you running low on disk?

therc avatar Dec 10 '16 19:12 therc

One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.

When I last debugged this, I traced it to the main event loop in nspawn exiting in an unexpected fashion (the KILL signal came from nspawn itself). Upgrading systemd made the problem go away, so I didn't investigate further.

therc avatar Dec 10 '16 20:12 therc

Which systemd version are you running on the machine building the driver? I saw similar problems with older releases. That's why the README says "tested on version 229, there might be issues with <= 225".

Version 231, running under 4.7.3-coreos-r2:

# systemctl --version
systemd 231
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

I do agree that this issue is a bug in systemd, and not your scripts.

Another thing worth checking... are you running low on disk?

No, I went to the trouble of expanding the image first with dd, then gdisk and e2size.

One measure that I could take right away is stopping execution right away. There is no point in continuing, it only obfuscates the real issue.

Good idea!

jpap avatar Dec 17 '16 04:12 jpap

Speaking of -coreos-r2... I found and fixed a silly bug that affected 1185.2.0 and up. Can you retry with the latest commit?

therc avatar Dec 22 '16 00:12 therc

I think I found the culprit. Running systemd-nspawn with --share-system increases the chances of a SIGKILL. Please try the latest version, I have just pushed a number of fixes. With the latest CoreOS security fix, you might need to add the --emerge-sources flag, since CoreOS engineers seem to not have uploaded portage binary packages for the those versions. To compensate for the larger of packages built now, the scripts build 4 of them at a time.

therc avatar Jan 24 '17 21:01 therc

I tried the latest version, got the same error:

$ ./build.sh 367.57 stable 1298.5.0

tar: *.ko: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors

# systemctl --version
systemd 231

dashesy avatar Mar 06 '17 22:03 dashesy

1298.5.0 is never going to work with 367.57.

Newer versions of the driver (375, etc.) support Linux 4.9.9 (which is what the latest CoreOS uses), but 367.57 dates back to last October and does not support it (get_user_pages() has a different signature).

Why are you using it instead of a more recent version? Is it because you have old GRID cards and Nvidia docs tell you to use v367, as that's the last version to support them? E.g. from http://us.download.nvidia.com/XFree86/Linux-x86_64/378.13/README/supportedchips.html

"Below are the legacy GPUs that are no longer supported in the unified driver. These GPUs will continue to be maintained through the special legacy NVIDIA GPU driver releases."

If that's the case, you need to ask Nvidia to make one of the special legacy releases they have promised. An example here:

https://devtalk.nvidia.com/default/topic/997603/linux/newer-367-driver-for-grid-k520-/

therc avatar Mar 06 '17 23:03 therc

Yes, also looked at the logs and it was "error: too many arguments to function 'get_user_pages'"

No particular reason! Will try v375 now, thanks.

dashesy avatar Mar 06 '17 23:03 dashesy

$ ./build.sh 375.20 stable 1298.5.0

succeeded. Thanks.

dashesy avatar Mar 06 '17 23:03 dashesy

I had another problem when I used --keep and ran ./build twice emerge complained about not enough space and did not build the modules. With a fresh run it worked. I did not try to reproduce it.

Otherwise this issue is resolved.

dashesy avatar Apr 02 '17 22:04 dashesy

apt install systemd-container

ckome avatar Jul 17 '18 09:07 ckome