Linux installation driver issues RTX 3090
Sorry if this is the wrong place but I wouldn't know where else to ask: forum and Discord was not helpful so far.
I have been trying to get DFL working on ubuntu 20 for days now but it doesn't recocnize the RTX 3090 and starts training on CPU I guess.
I have installed everything by the guide and therefore think there is a diver - cudatoolkit incompatibility present? Can that be?
Ubuntu installes latest drivers + 460 and cudatoolkit 11.3 as far as I understand. Do I understand correctly that the conda env cudatoolkit is the one that gets respected or does the systemwide 11.3 need to be removed? Then I tried to rollback the driver to 455 or 450 since some people told me that is working correctly. I could never get it done. After installation and blacklisting nouveau nvidia-smi reports "no devices were found".
I am terribly desperate that I can't get it working and considering to switch to the windows installation hoping it will work out of the box.
I collided with the problem with the latest version of DFL. Here is what helped me:
- Do not use ubuntu. It often occurs problems and conflicts with the NVIDIA driver.
- I recommend using the Arch-based distro. Alternatively, Garuda or Manjaro. Arch also comes for this purpose.
- Install nvidia-dkms-perfomance driver from AUR
- Follow these recommendations: conda create -n deepfacelab -c main python=3.7 cudnn=8.2.1 cudatoolkit=11 install pip packages to conda env: python -m pip install tqdm numpy h5py opencv-python ffmpeg-python scikit-image scipy colorama tensorflow-gpu pyqt5 tf2onnx I hope it will help you.
Thanks a lot for your comment. I’ve switched to windows where it works like a breeze. What is the main difference between Ubuntu and arch in that case so that arch has less problems with Nvidia drivers?
The fact that Ubuntu itself has a lot of bugs. The case is also in the kernel, which in Ubuntu outdated and works badly with the latest versions of the NVIDIA video driver (and not only). On the Garuda Linux based on Arch, DeepfaceLab works almost a half times faster than on Windows. I have a kernel of 5.14 Zen and the latest version of the nvidia-dkms-perfomance driver. For my Tesla K80 is the best option. Also these scripts are outdated, following my instructions, the DeepfaceLab build dated by April-May will work. PS: And for the future. The problems with the driver are not the problems of specifically these scripts, so it is better to contact the support of your Linux distribution. The Arch, Garuda and Manjaro community work perfectly and helps beginners, unlike corporate distributions.
Thank you a lot! I will try your guide at some point when I can experiment. Do you have a solution to updated .sh scripts, since this repo uses older ones? I guess not to much has changed at the API/shell side of DFL?
@nagadit author. We are waiting for updates from his side. But you do not prevent you from using older build DeepfaceLab.
BEST WAY:
- Install Lambda Stack:
LAMBDA_REPO=$(mktemp) &&
wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb &&
sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} &&
sudo apt-get update && sudo apt-get install -y lambda-stack-cuda
sudo reboot
- change requirements-cuda.txt
qdm numpy==1.19.3 numexpr h5py==3.1.0 opencv-python==4.1.0.25 ffmpeg-python==0.1.17 scikit-image==0.14.2 scipy==1.4.1 colorama tensorflow-gpu==2.5.0 pyqt5 tf2onnx==1.8.4
- python -m pip install -r ./requirements-cuda.txt
Suse Linux with this environment
conda create -n deepfacelab -c main -c "nvidia/label/cuda-11.8.0" -c conda-forge python=3.7 cudnn=8 cuda-toolkit=11.8
and the requirements of zabique
it works on a rtx 4090