TensorFlowASR
TensorFlowASR copied to clipboard
Tensorflow or CUDA BUG
my OS : Ubuntu 18.04 my env : conda did I customized the code : No
I did everything such as README file says.
conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU conda activate tfasr pip install -U tensorflow-gpu # upgrade to latest version of tensorflow git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR pip install .
everything was fine but when I try to use tensorflow( or start training ) I see this error :
python3 -c 'import tensorflow as tf; print(tf.__version__)'
2021-03-12 13:40:05.653403: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-03-12 13:40:05.653427: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "
", line 1, in File "/home/shenasa/.local/lib/python3.8/site-packages/tensorflow/init.py", line 435, in _ll.load_library(_main_dir) File "/home/shenasa/.local/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 153, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: /home/shenasa/anaconda3/envs/tfasr2/lib/python3.8/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZN10tensorflow15TensorShapeBaseINS_11TensorShapeEEC1EN4absl14lts_2020_02_254SpanIKlEE
When I downgrade the tensorflow version to pip install -U tensorflow==2.3.0 error will disapear . Does tensorflow 2.3.0 work fine with this repo codes?
What is the problem which cause this ?
@masoudMZB I think tensorflow 2.4 requires cuda 11 to be installed, if you have cuda < 11 such as 10, you should use tf 2.3.x (latest 2.3.2). And yes this code works with tensorflow >= 2.3.2
thanks, the problem solved. For those who use conda, you can install Cuda 11 by this command (The current version is 11 so you don't need to specify the version )
conda install -c anaconda cudatoolkit
You may see other errors as I see now. but cuda problem is solved now
@usimarit I have installed cudatoolkit (the version is 11 in current conda documentation ). But now I face this error
2021-03-15 08:26:03.714638: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-03-15 08:26:04.602684: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-15 08:26:04.603267: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-15 08:26:04.879374: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must b e at least one NUMA node, so returning NUMA node zero 2021-03-15 08:26:04.880078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-03-15 08:26:04.880109: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-03-15 08:26:04.882556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-03-15 08:26:04.880109: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 [16/52] 2021-03-15 08:26:04.882556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-03-15 08:26:04.882593: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-03-15 08:26:04.883583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-03-15 08:26:04.883809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-03-15 08:26:04.886273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-03-15 08:26:04.886813: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-03-14 13:04:28.928853: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory 2021-03-14 13:04:28.928865: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-03-14 13:04:28.929166: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-03-14 13:04:28.929928: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-14 13:04:28.929954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-03-14 13:04:28.929961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] Use RNNT loss in TensorFlow Reading /home/shenasa/masoud_parpanchi/cv-corpus-6.1-2020-12-11/fa/clips/train_mr_deleted_row.csv ... Traceback (most recent call last): File "examples/deepspeech2/train_keras_ds2.py", line 91, intrain_data_loader = train_dataset.create(global_batch_size) File "/home/shenasa/anaconda3/envs/tfasr2/lib/python3.8/site-packages/tensorflow_asr/datasets/asr_dataset.py", line 330, in create self.read_entries() File "/home/shenasa/anaconda3/envs/tfasr2/lib/python3.8/site-packages/tensorflow_asr/datasets/asr_dataset.py", line 110, in read_entries self.entries[i][-1] = " ".join([str(x) for x in self.text_featurizer.extract(line[-1]).numpy()]) File "/home/shenasa/anaconda3/envs/tfasr2/lib/python3.8/site-packages/tensorflow_asr/featurizers/text_featurizers.py", line 152, in extract indices = [self.tokens2indices[token] for token in text] File "/home/shenasa/anaconda3/envs/tfasr2/lib/python3.8/site-packages/tensorflow_asr/featurizers/text_featurizers.py", line 152, in indices = [self.tokens2indices[token] for token in text]
I think the key point is this line
Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
But 1-) why is this happening? I installed the correct conda cudatoolkit 2-) and also what is that text_featurizers error?
@masoudMZB Could you show me the full error (until the end of the traceback)? I only see a part of it.
@usimarit that was the full error.
I couldn't solve the problem yet.
May I know your concda environment config which cause to train the model?(cuda toolkit version, and other libs you installed) I did everything as readme says
Did you installed cudnn8 and cuda 11 manually? or conda handled it for you? Because there is no cudnn8 yet in conda,
and also no tensorflow2.3 in conda ( I mean some libs are not available )
@masoudMZB Could you show me the content of the config.yml you're using? If you are using characters, make sure you specify the path of the decoder_config.vocabulary to point to the .characters file (or just set null if you want to use english characters)
Sorry I don't have GPU machine right now, the previous one is preinstalled cuda driver and I only do conda install tensorflow-gpu which will automatically install the cuda and cudatoolkit. So maybe you can check if your machine have cuda driver installed.
@usimarit Ok I'll share it now.
But for more clarification, I write my steps :
- create conda env
conda create -y -n tfasr python=3.8 - activate conda env
conda activate tfasr - install proper cudnn for tensorflow2.3.2
conda install -c anaconda cudnn( this will handle cuda 10 ) - install tensorflow 2.3.2 with pip
pip install tensorflow==2.3.2 - clone repo
git clone https://github.com/TensorSpeech/TensorFlowASR.git cd TensorFlowASR- install depencies
pip install . - run
./scripts/install_ctc_decoders.sh. ( I tested running the commands manually too. )
I think some changes in requirements.txt are needed. Because it seems some dependencies error is happening
And this is my config.yml (sorry for indentions) :
speech_config: sample_rate: 16000 frame_ms: 25 stride_ms: 10 num_feature_bins: 80 feature_type: spectrogram preemphasis: 0.97 normalize_signal: True normalize_feature: True normalize_per_feature: False decoder_config: vocabulary: /home/shenasa/masoud_parpanchi/tfasr2/persian.characters blank_at_zero: False beam_width: 500 lm_config: model_path: /home/shenasa/masoud_parpanchi/DeepSpeech/data/lm/kenlm.scorer alpha: 0.8816546701416587 beta: 3.412841019426007 model_config: name: deepspeech2 conv_type: conv2d conv_kernels: [[11, 41], [11, 21], [11, 11]] conv_strides: [[2, 2], [1, 2], [1, 2]] conv_filters: [32, 32, 96] conv_dropout: 0.1 rnn_nlayers: 5 rnn_type: lstm rnn_units: 512 rnn_bidirectional: True rnn_rowconv: 0 rnn_dropout: 0.1 fc_nlayers: 0 fc_units: 1024 learning_config: train_dataset_config: use_tf: False data_paths: - /home/shenasa/masoud_parpanchi/cv-corpus-6.1-2020-12-11/fa/clips/train_mr_deleted_row.csv shuffle: True cache: True buffer_size: 100 drop_remainder: True stage: train eval_dataset_config: use_tf: False data_paths: - /home/shenasa/masoud_parpanchi/cv-corpus-6.1-2020-12-11/fa/clips/dev_mr_deleted_row.csv shuffle: False cache: True buffer_size: 100 drop_remainder: True stage: eval test_dataset_config: use_tf: False data_paths: - /home/shenasa/masoud_parpanchi/cv-corpus-6.1-2020-12-11/fa/clips/test_mr_deleted_row.csv shuffle: False cache: True buffer_size: 100 drop_remainder: True stage: test optimizer_config: class_name: adam config: learning_rate: 0.0001 running_config: batch_size: 4 num_epochs: 40 accumulation_steps: 8 outdir: /home/shenasa/masoud_parpanchi/TensorFlowASR/data/output_dir log_interval_steps: 400 save_interval_steps: 400 eval_interval_steps: 800 checkpoint: filepath: /home/shenasa/masoud_parpanchi/TensorFlowASR/data/chkpnt/{epoch:02d}.h5 save_best_only: True save_weights_only: False save_freq: epoch states_dir: /home/shenasa/masoud_parpanchi/TensorFlowASR/data/states tensorboard: log_dir: /home/shenasa/masoud_parpanchi/TensorFlowASR/data/tensorboard histogram_freq: 1 write_graph: True write_images: True update_freq: epoch profile_batch: 2
and lang.characters file is something like this:
ا آ ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی ء '
and this is the latest error I faced :
2021-03-18 11:33:08.668512: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so. 11.0: cannot open shared object file: No such file or directory 2021-03-18 11:33:08.668534: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-03-18 11:33:09.509949: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-18 11:33:09.510512: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-03-18 11:33:09.781620: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must 2021-03-18 11:33:09.510512: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-03-18 11:33:09.781620: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-03-18 11:33:09.782146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-03-18 11:33:09.782254: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-03-18 11:33:09.783591: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory 2021-03-18 11:33:09.783886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory 2021-03-18 11:33:09.785012: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-03-18 11:33:09.785229: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-03-18 11:33:09.786321: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-03-18 11:33:09.786395: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory 2021-03-18 11:33:09.786452: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory 2021-03-18 11:33:09.786460: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-03-18 11:33:09.786765: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-03-18 11:33:09.787729: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-18 11:33:09.787747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-03-18 11:33:09.787753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] Use RNNT loss in TensorFlow Reading /home/shenasa/masoud_parpanchi/cv-corpus-6.1-2020-12-11/fa/clips/train_mr_deleted_row.csv ... Traceback (most recent call last): File "examples/deepspeech2/train_keras_ds2.py", line 91, in
train_data_loader = train_dataset.create(global_batch_size) File "/home/shenasa/anaconda3/envs/tfasr22/lib/python3.8/site-packages/tensorflow_asr/datasets/asr_dataset.py", line 330, in create self.read_entries() File "/home/shenasa/anaconda3/envs/tfasr22/lib/python3.8/site-packages/tensorflow_asr/datasets/asr_dataset.py", line 110, in read_entries self.entries[i][-1] = " ".join([str(x) for x in self.text_featurizer.extract(line[-1]).numpy()]) File "/home/shenasa/anaconda3/envs/tfasr22/lib/python3.8/site-packages/tensorflow_asr/featurizers/text_featurizers.py", line 152, in extract indices = [self.tokens2indices[token] for token in text] File "/home/shenasa/anaconda3/envs/tfasr22/lib/python3.8/site-packages/tensorflow_asr/featurizers/text_featurizers.py", line 152, in indices = [self.tokens2indices[token] for token in text] KeyError: 'f'
@masoudMZB
The error says that your dataset has a character 'f' but your .characters file doesn't have. Please make sure your vocabulary file (aka .characters) includes all of characters in your dataset.
Hi @usimarit , I'm getting
path_trie.h:10:10: fatal error: fst/fstlib.h: No such file or directory while installing CTC decoder.
@ahmedalbahnasawy
Can you remove the directory externals/ctc_decoders by running rm -rf ./externals/ctc_decoders. Then rerun the script ./scripts/install_ctc_decoders.sh?
@usimarit thanks for the help that was the problem and removing unwanted char helped.
But I have another problem now.
Everything is fine But the code can not detect GPU. and I see an error:
2021-03-21 09:21:44.729940: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
which points to cuda 11 :| But I installed Tensorflow 2.3.2 and cuda 10.1 and cudnn 7.6 by this commands :
pip install tensorflow-gpu==2.3.2
conda install -c anaconda cudnn ( as conda doc says this command will install cudnn 7.6.5, cuda 10 will install as a dependency)
if you read the full logs bellow you can see :
Successfully opened dynamic library libcusolver.so.10
how to force tensorflow to use cuda 10?
the full logs :
python examples/deepspeech2/train_ds2.py 2021-03-21 08:16:37.411998: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so. 11.0: cannot open shared object file: No such file or directory 2021-03-21 08:16:37.412020: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-03-21 08:16:38.256320: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-21 08:16:38.256878: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-03-21 08:16:38.510628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must b e at least one NUMA node, so returning NUMA node zero 2021-03-21 08:16:38.511245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-03-21 08:16:38.511333: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so. 11.0: cannot open shared object file: No such file or directory 2021-03-21 08:16:38.511402: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11 : cannot open shared object file: No such file or directory 2021-03-21 08:16:38.511466: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.s o.11: cannot open shared object file: No such file or directory 2021-03-21 08:16:38.512523: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-03-21 08:16:38.512706: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-03-21 08:16:38.514012: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-03-21 08:16:38.514064: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.s o.11: cannot open shared object file: No such file or directory 2021-03-21 08:16:38.514122: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: ca nnot open shared object file: No such file or directory 2021-03-21 08:16:38.514133: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentione d above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required lib raries for your platform. Skipping registering GPU devices... 2021-03-21 08:16:38.514411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA2021-03-21 08:16:38.514411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library[102/1754] to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-03-21 08:16:38.514910: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-03-21 08:16:38.514930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-03-21 08:16:38.514935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
after this, the dataset loading logs are written.
@masoudMZB the best option is to do conda install tensorflow-gpu, this will install all dependencies that works with tensorflow gpu to anaconda env, then you can upgrade or downgrade the tensorflow version by running pip install tensorflow-gpu==x.y.z
I’ll close the issue here due to inactivity. Feel free to reopen if you have further questions.