neural_sp icon indicating copy to clipboard operation
neural_sp copied to clipboard

what's version fo cuda and pytorch

Open jinggaizi opened this issue 3 years ago • 2 comments

i run the aishell example(transducer) and use torch1.4+cuda10.1 or torch1.5+cuda10.1, there are some error as follows: torch1.4+cuda10.1: /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). 40%|██████████████████████████████████████████████████████████████████████████████▋ | 144000/360293 [18:43<32:16, 111.68it/s]Traceback (most recent call last): File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 534, in save_path = pr.runcall(main) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/cProfile.py", line 121, in runcall return func(*args, **kw) File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 379, in main loss, observation = model(batch_dev, task=task, is_eval=True) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in replica 0 on device 0. Original Traceback (most recent call last): File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 261, in forward loss, observation = self._forward(batch, task) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 274, in _forward eout_dict = self.encode(batch['xs'], 'all') File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in encode xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/torch_utils.py", line 68, in pad_list max_time = max(x.size(0) for x in xs) ValueError: max() arg is an empty sequence

torch1.5+cuda10.1: Removed 0 empty utterances 0%| | 0/360293 [00:00<?, ?it/s]Traceback (most recent call last): File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 534, in save_path = pr.runcall(main) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/cProfile.py", line 121, in runcall return func(*args, **kw) File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 338, in main teacher=teacher, teacher_lm=teacher_lm) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 264, in forward loss, observation = self._forward(batch, task, teacher, teacher_lm) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 274, in _forward eout_dict = self.encode(batch['xs'], 'all') File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in encode xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.) File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/base.py", line 55, in device return next(self.parameters()).device StopIteration

jinggaizi avatar Feb 25 '21 08:02 jinggaizi

@hirofumi0810 hi, what's version of torch and cuda that you are running the example

jinggaizi avatar Feb 25 '21 08:02 jinggaizi

it's work with torch1.4 when use LAS conf

jinggaizi avatar Feb 25 '21 08:02 jinggaizi