training
training copied to clipboard
RNNT apt-get update Fails in Docker Build
I'm trying to run the RNNT training benchmark, but the Docker build results in a GPG error because of an unsigned repository. Is NVIDIA still supporting Ubuntu 18 with this repository?
training/rnn_speech_recognition/pytorch$ bash scripts/docker/build.sh
.
.
.
Reading package lists...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
The command '/bin/sh -c apt-get update && apt-get install -y libsndfile1 sox git cmake jq && apt-get install -y --no-install-recommends numactl && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
training/rnn_speech_recognition/pytorch$
Updating the base image from pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel to pytorch/pytorch:1.12.0-cuda11.3-cudnn8-devel results in a later error:
training/rnn_speech_recognition/pytorch$ bash scripts/docker/build.sh
.
.
.
creating build/temp.linux-x86_64-3.7/src
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/cuda/include -fPIC -I/workspace/deps/warp-transducer/include -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/opt/conda/include/python3.7m -c src/binding.cpp -o build/temp.linux-x86_64-3.7/src/binding.o -fPIC -std=c++14 -DWARPRNNT_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -DTORCH_EXTENSION_NAME=warp_rnnt -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/binding.cpp:8:14: fatal error: THC.h: No such file or directory
#include "THC.h"
^~~~~~~
compilation terminated.
setup.py:11: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(torch.__version__) >= LooseVersion("1.5.0"):
/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
/opt/conda/lib/python3.7/site-packages/setuptools/command/easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py:411: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
error: command '/usr/bin/gcc' failed with exit code 1
The command '/bin/sh -c COMMIT_SHA=f546575109111c455354861a0567c8aa794208a2 && git clone https://github.com/HawkAaron/warp-transducer deps/warp-transducer && cd deps/warp-transducer && git checkout $COMMIT_SHA && sed -i 's/set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")/#set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")/g' CMakeLists.txt && sed -i 's/set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_75,code=sm_75")/set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_80,code=sm_80")/g' CMakeLists.txt && mkdir build && cd build && cmake .. && make VERBOSE=1 && export CUDA_HOME="/usr/local/cuda" && export WARP_RNNT_PATH=`pwd` && export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME && export D_LIBRARY_PATH="$CUDA_HOME/extras/CUPTI/lib64:$LD_LIBRARY_PATH" && export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH && export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH && export CFLAGS="-I$CUDA_HOME/include $CFLAGS" && cd ../pytorch_binding && python3 setup.py install && rm -rf ../tests test ../tensorflow_binding && cd ../../..' returned a non-zero code: 1
training/rnn_speech_recognition/pytorch$
Thanks for bringing this up, here is PR with a fix: https://github.com/mlcommons/training/pull/586
@coppock while we're figuring out CLA issues in PR 586, could you try those changes out locally to see if you can get past this error?
I just now tested #586, and it allows the build to complete.
Closing based on comment