icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Librispeech assert x.size(0) == lengths.max().item()

Open dfmordol opened this issue 2 years ago • 4 comments

I successfully trained a librispeech-100 pruned_transducer_stateless2 model. But when I tried to train on my own data I got an exception.

2022-08-30 13:45:32,285 INFO [train.py:1026] features shape: torch.Size([3, 1939, 80]) 2022-08-30 13:45:32,286 INFO [train.py:1030] num tokens: 99 Traceback (most recent call last): File "./pruned_transducer_stateless2/train.py", line 1094, in main() File "./pruned_transducer_stateless2/train.py", line 1087, in main run(rank=0, world_size=1, args=args) File "./pruned_transducer_stateless2/train.py", line 942, in run scan_pessimistic_batches_for_oom( File "./pruned_transducer_stateless2/train.py", line 1051, in scan_pessimistic_batches_for_oom loss, _ = compute_loss( File "./pruned_transducer_stateless2/train.py", line 595, in compute_loss simple_loss, pruned_loss = model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/model.py", line 125, in forward encoder_out, x_lens = self.encoder(x, x_lens, warmup=warmup) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/conformer.py", line 159, in forward assert x.size(0) == lengths.max().item() AssertionError

What it can be?

dfmordol avatar Aug 30 '22 13:08 dfmordol

k2 version: 1.18 Build type: Release Git SHA1: 79d9c12b1571f02e6440b955eacf6bd15555cd3c Git date: Wed Aug 17 05:19:41 2022 Cuda used to build k2: 11.0 cuDNN used to build k2: 8.0.4 Python version used to build k2: 3.8 OS used to build k2: CMake version: 3.18.0 GCC version: 7.5.0 CMAKE_CUDA_FLAGS: -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_35,code=sm_35 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_50,code=sm_50 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_60,code=sm_60 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_61,code=sm_61 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-strict-overflow --compiler-options -Wno-unknown-pragmas CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable -Wno-strict-overflow PyTorch version used to build k2: 1.7.1 PyTorch is using Cuda: 11.0 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Max cpu memory allocate: 214748364800 bytes (or 200.0 GB) k2 abort: False file: /opt/conda/lib/python3.8/site-packages/k2-1.18.dev20220818+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/k2/version/version.py _k2.file: /opt/conda/lib/python3.8/site-packages/k2-1.18.dev20220818+cuda11.0.torch1.7.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so

dfmordol avatar Aug 30 '22 13:08 dfmordol

File "/workspace/icefall/egs/librispeech/ASR/pruned_transducer_stateless2/conformer.py", line 159, in forward assert x.size(0) == lengths.max().item()

What is the output of x.shape and lengths?

csukuangfj avatar Aug 30 '22 15:08 csukuangfj

Shape: torch.Size([484, 3, 512]) Lengths: tensor([ 41, 427, 97], dtype=torch.int32)

dfmordol avatar Aug 31 '22 07:08 dfmordol

Could you change https://github.com/k2-fsa/icefall/blob/e18fa78c3a010fac4e6d3e83bdcff28197df04dc/egs/librispeech/ASR/pruned_transducer_stateless2/train.py#L594

to

 feature_lens = supervisions["num_frames"].to(device) 
 print(feature_lens)
 print(feature.shape)

and show the output?

csukuangfj avatar Aug 31 '22 12:08 csukuangfj