light-the-torch Thank god for this, you saved me

I can't figure out how to say thank you except to file this and say you just saved me after 2 days of wasted broken installs and deps on an A100.

Oct 07 '22 08:10 arnicas

Thank god for this

Well, you could thank me instead :stuck_out_tongue: Jokes aside, thanks for posting this. It means a lot. I'm glad that this project helped you out.

If you find the time, could you tell me how your envs were broken before? While working on #73, I figured that detecting broken envs would also be good addition for this tools. In the future we might even have ltt fix to fix the env automatically. But all of this depends on correctly detecting broken envs and the reason why they are broken in the first place.

Plus, did you encounter any issues with light-the-torch? Was the usage clear just from the README?

Oct 07 '22 08:10 pmeier

Hah - it was great. My only request would be to add torchvision (and maybe torchaudio) as extra possible installs along with? I held my breath when I added it but it worked ok.

The envs were broken because I didn't know what I was doing on the installation. My nvidia-smi said one thing about cuda (11.6), my nvcc -V said another, the pytorch site had no supporting versions in their little "helper", and I couldn't figure out how to get the right +cu on the install. So various libs that need torch tried to install it, and different versions, and actually broke my installs of older versions that i had gotten working with cuda. So then I also got repeated errors from cuda/torch saying that my driver with sm_80 wasn't supported etc. Shrug, hard to describe the chaos.

Thanks again!

Oct 07 '22 09:10 arnicas

My only request would be to add torchvision (and maybe torchaudio) as extra possible installs along with? I held my breath when I added it but it worked ok.

Most PyTorch distributions, including torchvision and torchaudio are already supported. There is no public list, but I'm periodically monitoring their indices to check if we are missing something.

https://github.com/pmeier/light-the-torch/blob/eda21f3d1398e0551546f6fde0e79f309de0951d/light_the_torch/_patch.py#L38-L50

If something breaks, feel free to reach out.

My nvidia-smi said one thing about cuda (11.6), my nvcc -V said another

That is indeed very confusing and I fell for it myself in the beginning. nvcc reports the version of the CUDA toolkit you have installed, while nvidia-smi reports the version up to which compiled CUDA code is supported on your machine. So in your case, you can use everything with CUDA<=11.6, which I believe is most of the binaries PyTorch provides at the moment.

Plus, unless you actually want to compile CUDA code, e.g. building PyTorch from source, you don't need the CUDA toolkit installed on your system at all. PyTorch ships everything you need at runtime inside the wheels. This is why they are so large. You only need to have the driver installed.

TL;DR if you ever have to install PyTorch wheels manually again, trust nvidia-smi.

Oct 07 '22 10:10 pmeier

light-the-torch light-the-torch copied to clipboard

Thank god for this, you saved me

light-the-torch
light-the-torch copied to clipboard