cuda
cuda copied to clipboard
CUDA 11.3?
Hi all,
Thanks for your work packaging CUDA in an easy way for system76 machines!
PyTorch has moved up to CUDA 11.3 (see https://pytorch.org/get-started/locally/); does system76 expect to keep these releases up to date with NVIDIA releases, or should I install directly from NVIDIA if I need newer CUDA?
Thanks!
Better yet, why not 11.6 since that's what is included with system76-driver-nvidia
by default anyway?
It's recommended to build CUDA software in a devcontainer with Docker or Podman.
Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?
@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.
@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.
I was mistakenly using my nvidia-smi CUDA version instead of that reported by nvcc --version
. The latest on System76's packages is still 11.2.
It's recommended to build CUDA software in a devcontainer with Docker or Podman.
If that's so, then why provide the system76-cu*
packages at all? I don't want to have to spin up docker containers just to access my GPU in a script, or test out features from a library with CUDA capabilities.
:wave: thanks for supporting these convenient cuda installs! Question---
I'm encountering this same friction point. I go to install pytorch
but choices for prebuilt binaries are either cuda 10.2
or 11.3
. I can get 11.1 or 11.2 from system76, but not 11.3. I tried installing pytorch from source, but that's a whole other issue.
I'd be open to a docker or podman route, but it's currently at odds with my development workflow, and would add some more mental overhead to navigate. A cuda 11.3 fix would slot right in to my existing workflow.
If anyone finds this and has a worked solution of setting up cuda 11.3 manually on pop!_os
can you share? I may try it and share if I find a workaround...
Dev containers are the way to go
Ok, my workaround is to default back to cuda 10.2. Both System76 and pytorch have binaries for 10.2, so it just works out-of-the-box. I tried it out on my particular pytorch application and it appears to have worked.
I suspect you're right that in the long term dev containers make it easier for portable and reproducible environments. For some reason dev containers still haven't taken off in scientific computing, or at least my sub-community of it. Is there a migration guide available or planned? I found this NVIDIA website that seems streamlined. Is that the dev container workflow ya'll would recommend?
If I get around to trying it out, I'd be open to writing one of those "support" guides that you have on your documentation. I adore that your docs are all open source! So cool.
@gully (and anyone else this helps)
to run pytorch in a dev container, I followed this tutorial:
https://blog.roboflow.com/nvidia-docker-vscode-pytorch/
but ended up needing to install nvidia-docker following a comment on this gist:
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e
specifically
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e?permalink_comment_id=4186634#gistcomment-4186634
-
sudo apt install nvidia-docker2
- set the option
no-cgroups = true
in/etc/nvidia-container-runtime/config.toml
(notcontrol.toml
in spite of what the comment says) - run with flags as that comment suggests; e.g. to test,
docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi