Using MulitGPU issues
I have two GPU card. 2080ti 22G and V100 16G. Both cards can work when used alone. But if you want to use it simultaneously, the following error message will appear.
According to chat GPT it might be linked to missing NCCL support in the CUDA version you have installed on your environment. Make sure the CUDA version you have installed support NCCL...
https://developer.nvidia.com/nccl
It looks like you've encountered an error while trying to run a distributed training process with PyTorch. The key message here is:
RuntimeError: Distributed package doesn't have NCCL built in
NCCL (NVIDIA Collective Communications Library) is a library that supports multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs.
Here are some steps you could take to resolve the issue:
-
Ensure NCCL is Installed: Make sure that NCCL is installed on your system. NCCL is usually bundled with the PyTorch binaries if you install PyTorch using Conda or Pip with CUDA support.
-
Check PyTorch Installation: It might be necessary to reinstall PyTorch and ensure that you are using a version of PyTorch that is compatible with NCCL. You can use
condaorpipfor the installation and choose the version that includes CUDA support. -
Verify CUDA Version: Make sure that the CUDA version on your system is compatible with the version of NCCL and PyTorch you're using.
-
Environment Variables: Check your environment variables related to NCCL and CUDA (
NCCL_DEBUG=INFOcan be used to get more detailed logs). -
Distributed Backend: When initializing distributed training in PyTorch with
torch.distributed.init_process_group, make sure you're specifyingbackend='nccl'if you're using NVIDIA GPUs. -
Check GPU Availability: Make sure that the GPUs are available and not in use by another process. You can use the command
nvidia-smito check the status of the GPUs. -
Permissions: Ensure that you have the correct permissions to access the GPUs and the NCCL library.
-
Update/Reinstall NCCL: If you have an outdated version of NCCL, updating to the latest version might solve the issue.
-
Check PyTorch Forums/Documentation: If the error persists, check the PyTorch forums or the official documentation for similar issues or reach out for help with the specifics of your setup.
If after trying these steps the issue isn't resolved, please provide more details about your environment such as the versions of PyTorch, NCCL, and CUDA you are using, as well as the specific code snippet where you initialize the distributed process. That would help in diagnosing the problem more accurately.
I don't know how to check the nccl's installatio.I asked chatGPT. It tall me input this code:
import torch
print(torch.cuda.nccl.version())
There is error show:
Traceback (most recent call last):
File "
I checked this file “C:\AI\kohya_ss\venv\lib\site-packages\torch\cuda\nccl.py “ is in it.
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
yeah it will never work in windows natively. you can get ubuntu 22 from the microsoft app store if you want to reset it up under linux
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
yeah it will never work in windows natively. you can get ubuntu 22 from the microsoft app store if you want to reset it up under linux
Is this virtual operating mode available? Can Ubuntu in this mode call GPU and Linux CUDA? Will the performance be very poor? The GPU performance cannot be used up.
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
yeah it will never work in windows natively. you can get ubuntu 22 from the microsoft app store if you want to reset it up under linux
Is this virtual operating mode available? Can Ubuntu in this mode call GPU and Linux CUDA? Will the performance be very poor? The GPU performance cannot be used up.
yes it uses HyperV, so very much a VM. I installed Ubuntu 22 instead of Windows when I found this out and haven't looked back.
also Microsoft I think has CUDA under their VM linux system working fine but yeah I imagine your taking a hit and you need all Linux software and everything
at that point might as well the install the real thing bare metal imo
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
yeah it will never work in windows natively. you can get ubuntu 22 from the microsoft app store if you want to reset it up under linux
Is this virtual operating mode available? Can Ubuntu in this mode call GPU and Linux CUDA? Will the performance be very poor? The GPU performance cannot be used up.
yes it uses HyperV, so very much a VM. I installed Ubuntu 22 instead of Windows when I found this out and haven't looked back.
also Microsoft I think has CUDA under their VM linux system working fine but yeah I imagine your taking a hit and you need all Linux software and everything
at that point might as well the install the real thing bare metal imo
I have installed ubuntu by microsoft store,but press running have error
https://developer.nvidia.com/nccl I have visited this website and found that NCCL may not be installable on Windows systems; it is intended for use with Linux systems. There is no corresponding installation program for Windows systems.
yeah it will never work in windows natively. you can get ubuntu 22 from the microsoft app store if you want to reset it up under linux
Is this virtual operating mode available? Can Ubuntu in this mode call GPU and Linux CUDA? Will the performance be very poor? The GPU performance cannot be used up.
yes it uses HyperV, so very much a VM. I installed Ubuntu 22 instead of Windows when I found this out and haven't looked back. also Microsoft I think has CUDA under their VM linux system working fine but yeah I imagine your taking a hit and you need all Linux software and everything at that point might as well the install the real thing bare metal imo
I have installed ubuntu by microsoft store,but press running have error
![]()
I have fixed it this problem. Install the WSL form windows moudul. and upgrade WSL2. Ubuntu can work.there is new problem.How use this to running the kohya tranning program.It cann't load any windows's derives or files. Git clone another kohya problem in this ubuntu system again? This system cann't display any nvidia GPU ?( I try install nvidia liunx drivers and cudda).But still nothing. how to use this ubuntu(WSL)?
https://github.com/bmaltais/kohya_ss/issues/2364#issue-2255162976 It seems that wsl2 cannot correctly identify the GPU model. I have used the nvidia driver. Moreover, directly installing kohya_ss and running it seems that the graphics card cannot be correctly recognized. I don’t know if I need to install the cuda driver separately.
Yes, you need to install CUDA as specified in the read a under pre requirement for Linux.
Yes, you need to install CUDA as specified in the read a under pre requirement for Linux.
I have installed cuda in linux.But start program cannot find coda
