cuda-toolkit icon indicating copy to clipboard operation
cuda-toolkit copied to clipboard

Bug for 12.1

Open Haydnspass opened this issue 1 year ago • 1 comments

Hey,

since I changed to CUDA 12.1 I sometimes get the following error on some runs. It could be unrelated to 12.1 but I noticed it now first.

Run Jimver/[email protected]
  with:
    cuda: 1[2](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:2).1.0
    sub-packages: []
    method: local
    linux-local-args: ["--toolkit", "--samples"]
    use-github-cache: true
  env:
    INPUT_RUN_POST: true
    CONDA: /usr/share/miniconda[3](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:3)
    CONDA_PKGS_DIR: /home/runner/conda_pkgs_dir
/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/SplinePSF/SplinePSF --files-from manifest.txt --use-compress-program zstdmt
Failed to save: Unable to reserve cache with key cuda_installer-linux-5.15.0-10[4](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:4)0-azure-12.1.0, another job may be creating this cache. More details: Cache already exists. Scope: refs/heads/dev_fix_hopper_ci, Key: cuda_installer-linux-[5](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:5).15.0-1040-azure-12.1.0, Version: 4bfd4[6](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:6)a3233f39e[7](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:7)afb92a41e0e6a5d43d677cf4e3f9feca[8](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:8)11a22308a54230c
/usr/bin/sudo /opt/hostedtoolcache/cuda_installer-linux/12.1.0/x64/cuda_installer-linux-5.15.0-1040-azure_12.1.0.run --silent --toolkit --samples
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::copy_file: No such file or directory: "./builds/cuda_cupti/extras/CUPTI/doc/Cupti/structCUpti__ActivityMemcpy3.html", "/usr/local/cuda-12.1/extras/CUPTI/doc/Cupti/structCUpti__ActivityMemcpy3.html"
Aborted (core dumped)
Warning: Error during installation: Error: The process '/usr/bin/sudo' failed with exit code 134
Starting artifact upload
For more detailed logs during the artifact upload process, enable step-debugging: https://docs.github.com/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging
Artifact name is valid!
Container for artifact "install-log" successfully created. Starting upload of file(s)
Total size of all the files uploaded is 10[9](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:9)13 bytes
File upload process has finished. Finalizing the artifact upload
Artifact has been finalized. All files have been successfully uploaded!

The raw size of all the files that were specified for upload is 172032 bytes
The size of all the files that were uploaded is [10](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:10)9[13](https://github.com/Haydnspass/SplinePSF/actions/runs/5344931866/jobs/9689924701#step:4:14) bytes. This takes into account any gzip compression used to reduce the upload size, time and storage

Note: The size of downloaded zips can differ significantly from the reported size. For more information see: https://github.com/actions/upload-artifact#zipped-artifact-downloads 

Error: Error: The process '/usr/bin/sudo' failed with exit code 134

any idea what to conclude from that? The workflow for this run is here: https://github.com/Haydnspass/SplinePSF/blob/0ab69d9722f0abce3de11947a5bccf4946341ba9/.github/workflows/build_upload_test.yaml

Haydnspass avatar Jun 22 '23 11:06 Haydnspass

You can resolve this by using the network method and only installing the packages you actually need. This also makes the installation a lot faster. (For all available packages see e.g. https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/)

LLukas22 avatar Jul 08 '23 15:07 LLukas22