docs
docs copied to clipboard
Specify additional steps to utilize GPU for Linux users
Specify additional steps to utilize GPU for Linux users
@MarkDaoust @markmcd
@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates?
As I remembered, the current recommended way to install TF is to use pip
. I do not have further info on this. @MarkDaoust may comment on this.
As I remembered, the current recommended way to install TF is to use
pip
. I do not have further info on this. @MarkDaoust may comment on this.
@haifeng-jin it seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix!
It turns out that when you pip install tensorflow[and-cuda]
all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU.
It turns out that when you
pip install tensorflow[and-cuda]
all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU.
Can we instead add these to the install guide?
configure manually the environment variables as appropriate
@mihaimaruseac shouldn't we explain/specify how to configure manually the environment variables as appropriate?
Why is conda mentioned in this patch? It makes the install guide more convoluted and seems unnecessary to me.
Why is conda mentioned in this patch? It makes the install guide more convoluted and seems unnecessary to me.
@Tachi107 I agree. Should I proceed to erase everything related to conda refered as option 1 and just keep one suggested option (create a venv virtual environment)? Perhaps it would be better and more straight-forward?
Note that I'm not a tensorflow maintainer, just a casual user who happened to stumble upon this patch. But yeah, if I were you I would just show how to setup the venv. Conda users should already know how to do that with their non-default setup :)
Note that I'm not a tensorflow maintainer, just a casual user who happened to stumble upon this patch. But yeah, if I were you I would just show how to setup the venv. Conda users should already know how to do that with their non-default setup :)
@Tachi107 thank you. It seems very reasonable to simplify the guide like that. However for now I will keep it as is and await for the comments of the maintainers as well.
@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates?
There is no need to use conda, a standard venv works fine. In 2.15, tensorflow knew to go look for the NVIDIA binaries installed with pip
. With TF 2.16, you can help it by placing the the binaries on LD_LIBRARY_PATH, like suggested in this PR, or by creating symlinks from the TF package to the pip installed nvidia packages. E.g.,
python -m venv my-venv
source my-venv/bin/activate
python -m pip install tensorflow[and-cuda]
pushd $(dirname $(python -c 'print(__import__("tensorflow").__file__)'))
ln -svf ../nvidia/*/lib/*.so* .
popd
This produces output like:
'./libcublasLt.so.12' -> '../nvidia/cublas/lib/libcublasLt.so.12'
'./libcublas.so.12' -> '../nvidia/cublas/lib/libcublas.so.12'
'./libnvblas.so.12' -> '../nvidia/cublas/lib/libnvblas.so.12'
'./libcheckpoint.so' -> '../nvidia/cuda_cupti/lib/libcheckpoint.so'
'./libcupti.so.12' -> '../nvidia/cuda_cupti/lib/libcupti.so.12'
'./libnvperf_host.so' -> '../nvidia/cuda_cupti/lib/libnvperf_host.so'
'./libnvperf_target.so' -> '../nvidia/cuda_cupti/lib/libnvperf_target.so'
'./libpcsamplingutil.so' -> '../nvidia/cuda_cupti/lib/libpcsamplingutil.so'
'./libnvrtc-builtins.so.12.3' -> '../nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.12.3'
'./libnvrtc.so.12' -> '../nvidia/cuda_nvrtc/lib/libnvrtc.so.12'
'./libcudart.so.12' -> '../nvidia/cuda_runtime/lib/libcudart.so.12'
'./libcudnn_adv_infer.so.8' -> '../nvidia/cudnn/lib/libcudnn_adv_infer.so.8'
'./libcudnn_adv_train.so.8' -> '../nvidia/cudnn/lib/libcudnn_adv_train.so.8'
'./libcudnn_cnn_infer.so.8' -> '../nvidia/cudnn/lib/libcudnn_cnn_infer.so.8'
'./libcudnn_cnn_train.so.8' -> '../nvidia/cudnn/lib/libcudnn_cnn_train.so.8'
'./libcudnn_ops_infer.so.8' -> '../nvidia/cudnn/lib/libcudnn_ops_infer.so.8'
'./libcudnn_ops_train.so.8' -> '../nvidia/cudnn/lib/libcudnn_ops_train.so.8'
'./libcudnn.so.8' -> '../nvidia/cudnn/lib/libcudnn.so.8'
'./libcufft.so.11' -> '../nvidia/cufft/lib/libcufft.so.11'
'./libcufftw.so.11' -> '../nvidia/cufft/lib/libcufftw.so.11'
'./libcurand.so.10' -> '../nvidia/curand/lib/libcurand.so.10'
'./libcusolverMg.so.11' -> '../nvidia/cusolver/lib/libcusolverMg.so.11'
'./libcusolver.so.11' -> '../nvidia/cusolver/lib/libcusolver.so.11'
'./libcusparse.so.12' -> '../nvidia/cusparse/lib/libcusparse.so.12'
'./libnccl.so.2' -> '../nvidia/nccl/lib/libnccl.so.2'
'./libnvJitLink.so.12' -> '../nvidia/nvjitlink/lib/libnvJitLink.so.12'
This is essentially what we do from the R interface in tensorflow::install_tensorflow()
and keras3::install_keras()
@t-kalinowski thank you very much for your valuable advice. I revised the PR accordingly.
@sgkouzias if you also create a symlink at my-venv/bin/ptxas
-> my-venv/lib/python.../site-packages/.../bin/ptxax
, then you could probably get away without needing to require users to modify default activate
and deactivate
scripts.
@sgkouzias if you also create a symlink at
my-venv/bin/ptxas
->my-venv/lib/python.../site-packages/.../bin/ptxax
, then you could probably get away without needing to require users to modify defaultactivate
anddeactivate
scripts.
@t-kalinowski thank you so much for your advice. Instructions have been totally revised as per your comments. Modifications to default activate
and deactivate
scripts are not required from users. Instructions should resemble more or less what you do in the R interface.
@8bitmp3 , @haifeng-jin , @MarkDaoust even TensorFlow version 2.17.0.rc0
requires to specify additional steps to utilize GPU for Linux users. The suggested instructions of this pull request offer a tested solution. I await your comments.
@learning-to-play, @SeeForTwo, @8bitmp3, @haifeng-jin, @MarkDaoust, @markmcd
Unfortunately the latest release namely TensorFlow 2.16.2
does not fix the ptxas
bug. When running a training script I get the error:
ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas 12.3.103 has a bug that we think can affect XLA. Please use a different version.' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Aborted (core dumped)
So it seems as TensorFlow 2.16.2
Fails to work with GPUs as well !
Notes:
- Successful installation was verified by running:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
- The solution included in the submitted pull request pending review helped to get rid of the
ptxas
bug and ultimately enforced TensorFlow2.16.2
to work with my GPU:
ln -sf $(find $(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))/*/bin/) -name ptxas -print -quit) $VIRTUAL_ENV/bin/ptxas
Thank you for the contribution, @sgkouzias :)
Given that the [and-cuda]
installation now does detect pip-installed CUDA components again, please add a disclaimer specify that that symbolic links are only necessary in case the intended way doesn't work, i.e. the components aren't being detected, and/or conflict with the existing system CUDA installation (like ptxas
for you).
Thank you for the contribution, @sgkouzias :) Given that the
[and-cuda]
installation now does detect pip-installed CUDA components again, please add a disclaimer specify that that symbolic links are only necessary in case the intended way doesn't work, i.e. the components aren't being detected, and/or conflict with the existing system CUDA installation (likeptxas
for you).
@belitskiy, @learning-to-play I revised instructions as advised and will be awaiting your feedback. It is my honor to contribute to the TensorFlow community.
Thanks for all your work everyone (especially @sgkouzias)!
I just tweaked the order so that this new GPU debugging step is after the step where you test the GPU.
I think this is still right so I'm merging it. But LMK if I misunderstood anything.
Thanks for all your work everyone (especially @sgkouzias)!
I just tweaked the order so that this new GPU debugging step is after the step where you test the GPU.
I think this is still right so I'm merging it. But LMK if I misunderstood anything.
Thank you @MarkDaoust 🙏 it is my honour. I noticed you mentioned merging, but it seems the pull request still needs a formal review due to branch protection rules. Could you please take a quick look and approve it when you have a chance? Many thanks again!
Really it has everything it needs we're just waiting for the internal merge, it should be through soon.