root
root copied to clipboard
Fails to build with cuDNN version 9
Check duplicate issues.
- [X] Checked for duplicates
Description
Building with cuDNN 9.0 or later results in the following errors:
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(500): error: identifier "cudnnRNNForwardTraining" is undefined
cudnnStatus_t status = cudnnRNNForwardTraining(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNForward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &, bool) [with AFloat=Float_t]" at line 43 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(513): error: identifier "cudnnRNNForwardInference" is undefined
cudnnStatus_t status = cudnnRNNForwardInference(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNForward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &, bool) [with AFloat=Float_t]" at line 43 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(545): error: identifier "cudnnRNNBackwardData" is undefined
cudnnStatus_t status = cudnnRNNBackwardData(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNBackward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &) [with AFloat=Float_t]" at line 43 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(571): error: identifier "cudnnRNNBackwardWeights" is undefined
status = cudnnRNNBackwardWeights(cudnnHandle, rnnDesc, seqLength, desc.xDesc.data(), x.GetDataPointer(),
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNBackward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &) [with AFloat=Float_t]" at line 43 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(500): error: identifier "cudnnRNNForwardTraining" is undefined
cudnnStatus_t status = cudnnRNNForwardTraining(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNForward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &, bool) [with AFloat=Double_t]" at line 44 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(513): error: identifier "cudnnRNNForwardInference" is undefined
cudnnStatus_t status = cudnnRNNForwardInference(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNForward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &, bool) [with AFloat=Double_t]" at line 44 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(545): error: identifier "cudnnRNNBackwardData" is undefined
cudnnStatus_t status = cudnnRNNBackwardData(
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNBackward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &) [with AFloat=Double_t]" at line 44 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn/RecurrentPropagation.cu(571): error: identifier "cudnnRNNBackwardWeights" is undefined
status = cudnnRNNBackwardWeights(cudnnHandle, rnnDesc, seqLength, desc.xDesc.data(), x.GetDataPointer(),
^
detected during instantiation of "void TMVA::DNN::TCudnn<AFloat>::RNNBackward(const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, TMVA::DNN::TCudnn<AFloat>::Tensor_t &, const TMVA::DNN::TCudnn<AFloat>::RNNDescriptors_t &, TMVA::DNN::TCudnn<AFloat>::RNNWorkspace_t &) [with AFloat=Double_t]" at line 44 of /build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu
8 errors detected in the compilation of "/build/root/src/root-6.30.06/tmva/tmva/src/DNN/Architectures/Cudnn.cu".
The missing functions were deprecated in cuDNN 8.0 and removed in cuDNN 9.0.
Reproducer
Build from source with cuDNN 9.0 or newer.
ROOT version
6.30.06
Installation method
build from source
Operating system
Arch Linux
Additional context
No response
Hi @dpiparo @lmoneta
Can this still be considered for 6.32? Would be nice for the LCG stacks if we could go to the latest cudnn with cuda 12.4
Hi all! To assess the situation, I tried to build ROOT with CUDNN 9.0 myself, and it is actually a huge interface change!
I wouldn't recommend to anyone to do this migration without the help of CI tests, which we don't have for anything CUDA-related.
Just for reference, the previous migration to CUDNN 8.0 wasn't done by a core ROOT developer but indeed generously by the Arch package maintainer @kgizdov in 2020:
https://github.com/root-project/root/pull/6058
Of the 3350 lines of code in tmva/tmva/src/DNN/Architectures/Cudnn
, a significant fraction had to be changed.
Therefore, we need to have a discussion: should cudnn
even be enabled in any build of ROOT?
I have a few more data points, besides the observation that it's only packagers that seem to care about cudnn=ON
:
- All questions about "cudnn" on the forum are about build problems, not actual usage: https://root-forum.cern.ch/search?q=cudnn
- On indico, it also doesn't seem like it's used much: https://indico.cern.ch/search/?q=cudnn&sort=mostrecent
- There is only one presentation about this work (a summer student talk)
For 3350 lines of code in ROOT where we don't know if they are used, the support burden is very high.
IMHO, you, @andresailer and @lahwaacz should consider going for cudnn=OFF
, and we should only continue to invest in this ROOT component once an actual user complains about its absence either here on GitHub or on the forum.
@lmoneta and @dpiparo, what is your opinion?
Hi @guitargeek ,
There are these proceedings that talk about cuDNN and TMVA as well. https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_06019.pdf
@guitargeek : I will soon open a PR adding this migration.
master done, 6.32 PR submitted, tests running https://github.com/root-project/root/pull/15636
@andresailer