cuda icon indicating copy to clipboard operation
cuda copied to clipboard

CUDA 11.3?

Open Linux-cpp-lisp opened this issue 3 years ago • 9 comments

Hi all,

Thanks for your work packaging CUDA in an easy way for system76 machines!

PyTorch has moved up to CUDA 11.3 (see https://pytorch.org/get-started/locally/); does system76 expect to keep these releases up to date with NVIDIA releases, or should I install directly from NVIDIA if I need newer CUDA?

Thanks!

Linux-cpp-lisp avatar Feb 23 '22 16:02 Linux-cpp-lisp

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

mraxilus avatar May 12 '22 13:05 mraxilus

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

mmstick avatar May 12 '22 14:05 mmstick

Better yet, why not 11.6 since that's what is included with system76-driver-nvidia by default anyway?

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

Linux-cpp-lisp avatar May 12 '22 20:05 Linux-cpp-lisp

@mraxilus is that true? Maybe that's just on the latest 22.04... definitely wasn't the case for me before on 21.10.

I was mistakenly using my nvidia-smi CUDA version instead of that reported by nvcc --version. The latest on System76's packages is still 11.2.

mraxilus avatar May 13 '22 01:05 mraxilus

It's recommended to build CUDA software in a devcontainer with Docker or Podman.

If that's so, then why provide the system76-cu* packages at all? I don't want to have to spin up docker containers just to access my GPU in a script, or test out features from a library with CUDA capabilities.

mraxilus avatar May 13 '22 01:05 mraxilus

:wave: thanks for supporting these convenient cuda installs! Question---

I'm encountering this same friction point. I go to install pytorch but choices for prebuilt binaries are either cuda 10.2 or 11.3. I can get 11.1 or 11.2 from system76, but not 11.3. I tried installing pytorch from source, but that's a whole other issue.

I'd be open to a docker or podman route, but it's currently at odds with my development workflow, and would add some more mental overhead to navigate. A cuda 11.3 fix would slot right in to my existing workflow.

If anyone finds this and has a worked solution of setting up cuda 11.3 manually on pop!_os can you share? I may try it and share if I find a workaround...

gully avatar Jun 03 '22 18:06 gully

Dev containers are the way to go

mmstick avatar Jun 03 '22 18:06 mmstick

Ok, my workaround is to default back to cuda 10.2. Both System76 and pytorch have binaries for 10.2, so it just works out-of-the-box. I tried it out on my particular pytorch application and it appears to have worked.

I suspect you're right that in the long term dev containers make it easier for portable and reproducible environments. For some reason dev containers still haven't taken off in scientific computing, or at least my sub-community of it. Is there a migration guide available or planned? I found this NVIDIA website that seems streamlined. Is that the dev container workflow ya'll would recommend?

If I get around to trying it out, I'd be open to writing one of those "support" guides that you have on your documentation. I adore that your docs are all open source! So cool.

gully avatar Jun 03 '22 20:06 gully

@gully (and anyone else this helps) to run pytorch in a dev container, I followed this tutorial:
https://blog.roboflow.com/nvidia-docker-vscode-pytorch/
but ended up needing to install nvidia-docker following a comment on this gist:
https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e specifically https://gist.github.com/kuang-da/2796a792ced96deaf466fdfb7651aa2e?permalink_comment_id=4186634#gistcomment-4186634

  • sudo apt install nvidia-docker2
  • set the option no-cgroups = true in /etc/nvidia-container-runtime/config.toml (not control.toml in spite of what the comment says)
  • run with flags as that comment suggests; e.g. to test, docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

NickleDave avatar Sep 12 '22 00:09 NickleDave