handson-ml3 icon indicating copy to clipboard operation
handson-ml3 copied to clipboard

Install tensorflow-gpu with conda

Open liganega opened this issue 2 years ago • 7 comments

The following sentence from "INSATALL.md" is somewhat outdated:

but the good news is that they will be installed automatically when you install the tensorflow-gpu package from Anaconda.

It is because the official command below uses tf version 2.4 which is not compatible with the jupyter notebooks for homl3.

conda install -c anaconda tensorflow-gpu

Cf. https://anaconda.org/anaconda/tensorflow-gpu

Is there any other easy way to use tensorflow-gpu?

liganega avatar Jul 03 '22 05:07 liganega

Hi @liganega,

According to the official step-by-step installation guide with pip and conda, it should be sufficient to install tensorflow (and not tensorflow-gpu) to have GPU support. Since the current homl3 environment requires tensorflow~=2.8.0, conda-forge channel has tensorflow 2.8.1, and CUDA and cuDNN versions are the same for TensorFlow 2.5-2.9, the easiest way should be as follows:

First, make sure you have already installed Nvidia driver (nvidia-smi must work). Next, you can edit environment.yml by deleting line 41 with - tensorflow~=2.8.0 # Deep Learning library in pip dependencies section and adding the following lines to conda dependencies (line 5 and onwards):

dependencies:  
  - cudatoolkit=11.2
  - cudnn=8.1.0
  - tensorflow=2.8

and follow the instructions from INSTALL.md (e.g. conda env create -f environment.yml, etc.).

Please let me know if it works, because I'll be able to check it on my Ubuntu machine with GPU only in two days or so.

vi3itor avatar Jul 04 '22 14:07 vi3itor

Hi @vi3itor,

thank you very much for your quick answer.

I followed your instruction and the installation worked well without any problems, except that some packages were reinstalled with different versions when the pip dependences was handled, e.g., tf 2.8.2 was installed after 2.8 had been removed.

However, the gpu support does not work although import tf works. And I am thinking why.

First, I am using WSL2(Ubuntu 20.4) on Windows 11. And the command nvidia-smi shows the following:

nvidia-smi

And the CUDA version is 11.6. Could it be the reason why gpu support doesn't work? Is it compatible with cudatoolkit=11.2 and cudnn=8.1.0?

I am quite sure that gpu support should work because it worked when tf. 2.4 was installed by

conda install -c anaconda tensorflow-gpu

liganega avatar Jul 04 '22 16:07 liganega

Hi @liganega,

Sorry, I assumed that you're on Linux. I have a Windows 10 with Titan Xp to test the setup, so next I'll list the required steps. The official guide by Nvidia describes how to set up WSL support, but TensorFlow's guide recommends doing it under the Windows native prompt. Here are the necessary steps:

  1. Make sure that you have the prerequisites installed and up to date:
  • Microsoft Visual C++ Redistributable. You can download the installer from here.
  • Nvidia Game Ready driver or (as in your case) RTX Quadro display driver, can be downloaded here.
  1. Download and install Miniconda using the following link. Don't add it to the PATH. Once it is installed run Anaconda Powershell Prompt (miniconda3) from the Start menu.

  2. Finally, edit environment.yml or download environment-gpu.yml from my repo. Here is the diff:

@@ -1,9 +1,11 @@
 name: homl3
 channels:
   - conda-forge
-  - defaults
+  # - defaults
 dependencies:
   - box2d-py  # used only in chapter 18, exercise 8
+  - cudatoolkit=11.2  # tensorflow dependencies for the GPU support
+  - cudnn=8.1.0  # tensorflow dependencies for the GPU support
   - ftfy=5.5  # used only in chapter 16 by the transformers library
   - graphviz  # used only in chapter 6 for dot files
   - python-graphviz  # used only in chapter 6 for dot files
@@ -37,7 +39,7 @@ dependencies:
     - tensorflow-addons~=0.16.1  # used in chapters 11 & 16 (for AdamW & seq2seq)
     - tensorflow-datasets~=4.5.2  # datasets repository, ready to use
     - tensorflow-hub~=0.12.0  # trained ML models repository, ready to use
-    - tensorflow-serving-api~=2.8.0  # or tensorflow-serving-api-gpu if gpu
+    - tensorflow-serving-api~=2.8.0  # used only in chapter 19
     - tensorflow~=2.8.0  # Deep Learning library
     - transformers~=4.16.2  # Natural Language Processing lib for TF or PyTorch
     - urlextract~=1.5.0  # optionally used in chapter 3, exercise 4
  1. Now, you can proceed with the commands from the INSTALL.md in Anaconda Powershell Prompt (miniconda3), but instead of python3 -m use python -m:
conda update -y -n base conda
conda env create -f environment.yml  # or environment-gpu.yml if you downloaded the file from my repo
conda activate homl3
python -m ipykernel install --user --name=python3
jupyter notebook

That's it. I tested GPU support on my Windows machine and everything works. Let me know if you have any problems and I'll prepare the PR.

vi3itor avatar Jul 05 '22 04:07 vi3itor

By the way, I forgot to mention that installing tensorflow 2.8.1 via conda-forge channel (conda install -c conda-forge tensorflow) won't work on Windows, because there are no corresonding binaries, only linux and osx. That's why it must be installed via pip.

vi3itor avatar Jul 05 '22 04:07 vi3itor

Hi @vi3itor,

first I have not tried yet what you proposed. There are two reasons.

  • I was not sure how to understand the connection between Anaconda Powershell Prompt(Windows 11) and WSL(Ubuntu).
  • As I mentioned already, tf + gpu worked when tf-gpu 2.4 was installed using conda install -c anaconda tensorflow-gpu. So I thought I should try further just in WSL(Ubuntu 20.04). And your explanation gave me some intuition how to proceed.

Here are the steps I took.

  1. Installing the newest nvidia driver.
  2. Following Option 1 from CUDA Support for WSL 2. But the following command should be run before installing the downloaded package:
    sudo cp /var/cuda-repo-wsl-ubuntu-11-7-local/cuda-B81839D3-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg
    
  3. Creating homl3 conda environment using yours environment-gpu.yml with two changes.
    • cudatoolkit: commented out. but it is automatically installed by conda. (forgot the version)
    • cudnn using 8.4.1 (latest)

However tf-gpu didn't work, and I got the following message:

Could not load dynamic library 'libcudnn.so.8'

But then I encountered an answer from

https://stackoverflow.com/questions/66977227/could-not-load-dynamic-library-libcudnn-so-8-when-running-tensorflow-on-ubun

It suggests the following steps for repair:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8
sudo apt-get install libcudnn8-dev

Then tf-gpu suddenly works! To be honest, I don't know why it works. There should be some unnecessary steps. Probably you know more and why it helped.

A last thing. A warning pops up with respect to NUMA support when one asks whether gpu support is there using the command tf.config.list_physical_devices("GPU").

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.

But people say it can be safely ignored. I just hope it is really so.

liganega avatar Jul 05 '22 15:07 liganega

Hi @liganega,

I'm glad you figured out how to get it running inside the WSL 2 Ubuntu distro.

My instructions above are the easiest I can think of for Windows users (and they are closely following official TensorFlow's instructions) to avoid all the hustles of installing the correct version of CUDA toolkit (without overwriting a special WSL Nvidia driver), updating LD_LIBRARY_PATH, etc. Notice, that you added cuda repository for ubuntu2004 when installing cudnn:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

while following the guide from Nvidia earlier you had to use a special wsl-ubuntu repo to avoid reinstalling Nvidia driver:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/ /"

When updating packages in the future with sudo apt-get upgrade it might actually reinstall CUDA from ubuntu2004 repo and overwrite the driver, so the setup might stop working.

I decided to try setting up the conda environment inside the WSL Ubuntu distro to see if I can get it working without manually installing CUDA and cuDNN. Next are the steps for the freshly downloaded Ubuntu distro:

# download and install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
# restart bash to apply changes
source .bashrc
# update conda
conda update -y -n base conda
# checkout the repository with environment-gpu.yml
git clone https://github.com/vi3itor/handson-ml3.git && cd handson-ml3 && git checkout -b windows-gpu && git pull origin windows-gpu
# create homl3 environment
conda env create -f environment-gpu.yml
conda activate homl3
python3 -m ipykernel install --user --name=python3

When I tried python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" I got multiple errors, such as

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
...
Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.

But conda had successfully installed both cudatoolkit and cudnn. And the files can be found with sudo find / -name 'libcudnn.so.8'. So I had to update LD_LIBRARY_PATH (it was empty):

export LD_LIBRARY_PATH=/home/$USER/miniconda3/envs/homl3/lib/:$LD_LIBRARY_PATH

Now python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" works without errors and finds GPU, but it still produces same NUMA warnings as you got for your setup:

...
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

More importantly, when I run jupyter notebook and run some notebooks with heavy GPU usage, such as 16_nlp_with_rnns_and_attention.ipynb, I again get NUMA warnings and can verify in the Task Manager that GPU is not used. I tried the solution mentioned here to explicitly select video card for PhysX settings in NVIDIA Control panel but it didn't help either.

As you can see, the support of WSL is still experimental, and it might be much easier to run the notebooks natively on Windows. I'll edit the instructions above to remove the requirement to install WSL 2 because it's not needed in this case. And I'd like to ask you to try these steps to see if it works well on Windows 11 with RTX video card. You can also compare how much time it takes to run 16_nlp_with_rnns_and_attention.ipynb under your WSL setup and under Anaconda Powershell Prompt (miniconda3).

vi3itor avatar Jul 06 '22 05:07 vi3itor

Dear @vi3itor,

I really appreciate your dedication, thank you so much for that.

Your instruction how to create conda environment for tensorflow gpu support on Windows 10/11 worked perfectly. It is the easiest way I know. Let me summarize again:

  1. Microsoft Visual C++ Redistributable. You can download the installer from here.
  2. Nvidia Game Ready driver or (as in your case) RTX Quadro display driver, can be downloaded here.
  3. Download and install Miniconda using the following link. Don't add it to the PATH. Once it is installed run Anaconda Powershell Prompt (miniconda3) from the Start menu.
  4. Finally, edit environment.yml or download environment-gpu.yml from my repo.
  5. Use Anaconda Powershell Prompt (miniconda3) to execute the following commands:
    1. conda update -y -n base conda
    2. conda env create -f environment.yml # or environment-gpu.yml if you downloaded the file from my repo
    3. conda activate homl3
    4. python -m ipykernel install --user --name=python3
    5. jupyter notebook

PS. I didn't try any more to struggle with WSL. Your solution is just perfect for me.

liganega avatar Jul 07 '22 16:07 liganega

Thanks for this great discussion, @liganega and @vi3itor , I really appreciate it. I'm currently working on improving the installation instructions. I don't use Windows, so this thread will definitely help me out. Thanks again!

ageron avatar Sep 25 '22 22:09 ageron

Thank you too!

I also would like to know an easy way on top of WSL under Windows 10/11.

Recently I tried again that way following CUDA on WSL User Guide, but failed.

  • CUDA toolkit installation works.
  • TensorFlow installation works too.
  • But GPU support not quite. With many warnings etc.
  • Probably dependency is the main problem. But not sure.

Using docker images could be a solution, but don't know yet.

liganega avatar Sep 26 '22 06:09 liganega