Check failed: axis < NumAxes() in ragged.cu
I trained a Conformer CTC model with Icefall and was trying to use it for decoding, but I am getting the following error:
[F] /export/c07/draj/mini_scale_2022/k2/k2/csrc/ragged.cu:116:k2::Array1
Environment details
Output from k2.version:
k2 version: 1.13
Build type: Debug
Git SHA1: 854b792368214a2adb4e89cd83f6bc09ddbbcdae
Git date: Sat Feb 19 22:22:31 2022
Cuda used to build k2: 10.2
cuDNN used to build k2: 7.6.5
Python version used to build k2: 3.8
OS used to build k2: Debian GNU/Linux 9.13 (stretch)
CMake version: 3.22.1
GCC version: 6.3.0
CMAKE_CUDA_FLAGS: --compiler-options -rdynamic --compiler-options -lineinfo --expt-extended-lambda -gencode arch=compute_61,code=sm_61 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 10.2
NVTX enabled: True
With CUDA: True
Disable debug: False
Sync kernels : False
Disable checks: False
Output from torch.utils.collect_env:
PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 9.13 (stretch) (x86_64)
GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Clang version: 3.8.1-24 (tags/RELEASE_381/final)
CMake version: version 3.22.1
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
Nvidia driver version: 440.33.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] k2==1.13.dev20220222+cuda10.2.torch1.8.1
[pip3] numpy==1.20.3
[pip3] pytorch-ranger==0.1.1
[pip3] pytorch-wpe==0.0.1
[pip3] torch==1.8.1
[pip3] torch-complex==0.2.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.8.0a0+e4e171a
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 nvidia
[conda] k2 1.13.dev20220222+cuda10.2.torch1.8.1 pypi_0 pypi
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.20.3 pypi_0 pypi
[conda] pytorch 1.8.1 py3.8_cuda10.2_cudnn7.6.5_0 pytorch
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] pytorch-wpe 0.0.1 pypi_0 pypi
[conda] torch 1.8.1 pypi_0 pypi
[conda] torch-complex 0.2.1 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 0.8.1 py38 pytorch
Stack trace of error
Here is the stack trace from gdb:
(gdb) run conformer_ctc/decode.py --epoch 12 --avg 3 --method ctc-decoding --max-duration 20 --num-paths 5
Starting program: /home/draj/anaconda3/envs/scale/bin/python conformer_ctc/decode.py --epoch 12 --avg 3 --method ctc-decoding --max-duration 20 --num-paths 5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x2aab38eb4700 (LWP 102872)]
[New Thread 0x2aab390b5700 (LWP 102873)]
[New Thread 0x2aab392b6700 (LWP 102874)]
[New Thread 0x2aab394b7700 (LWP 102875)]
[New Thread 0x2aab396b8700 (LWP 102876)]
[New Thread 0x2aab398b9700 (LWP 102877)]
[New Thread 0x2aab39aba700 (LWP 102878)]
[New Thread 0x2aab39cbb700 (LWP 102879)]
[New Thread 0x2aab39ebc700 (LWP 102880)]
[New Thread 0x2aab3a0bd700 (LWP 102881)]
[New Thread 0x2aab3a2be700 (LWP 102882)]
[New Thread 0x2aab3a4bf700 (LWP 102883)]
[New Thread 0x2aab3a6c0700 (LWP 102884)]
[New Thread 0x2aab3a8c1700 (LWP 102885)]
[New Thread 0x2aab3aac2700 (LWP 102886)]
[New Thread 0x2aab3acc3700 (LWP 102887)]
[New Thread 0x2aab3aec4700 (LWP 102888)]
[New Thread 0x2aab3b0c5700 (LWP 102889)]
[New Thread 0x2aab3b2c6700 (LWP 102890)]
[New Thread 0x2aab3b4c7700 (LWP 102891)]
[New Thread 0x2aab3b6c8700 (LWP 102892)]
[New Thread 0x2aab3b8c9700 (LWP 102893)]
[New Thread 0x2aab3baca700 (LWP 102894)]
[New Thread 0x2aab3bccb700 (LWP 102895)]
[New Thread 0x2aab3becc700 (LWP 102896)]
[New Thread 0x2aab3c0cd700 (LWP 102897)]
[New Thread 0x2aab3c2ce700 (LWP 102898)]
[New Thread 0x2aab3c4cf700 (LWP 102899)]
[New Thread 0x2aab3c6d0700 (LWP 102900)]
[New Thread 0x2aab3c8d1700 (LWP 102901)]
[New Thread 0x2aab3cad2700 (LWP 102902)]
[Thread 0x2aab3cad2700 (LWP 102902) exited]
[Thread 0x2aab3c8d1700 (LWP 102901) exited]
[Thread 0x2aab3c6d0700 (LWP 102900) exited]
[Thread 0x2aab3c4cf700 (LWP 102899) exited]
[Thread 0x2aab3c2ce700 (LWP 102898) exited]
[Thread 0x2aab3c0cd700 (LWP 102897) exited]
[Thread 0x2aab3becc700 (LWP 102896) exited]
[Thread 0x2aab3bccb700 (LWP 102895) exited]
[Thread 0x2aab3baca700 (LWP 102894) exited]
[Thread 0x2aab3b8c9700 (LWP 102893) exited]
[Thread 0x2aab3b6c8700 (LWP 102892) exited]
[Thread 0x2aab3b4c7700 (LWP 102891) exited]
[Thread 0x2aab3b2c6700 (LWP 102890) exited]
[Thread 0x2aab3b0c5700 (LWP 102889) exited]
[Thread 0x2aab3aec4700 (LWP 102888) exited]
[Thread 0x2aab3acc3700 (LWP 102887) exited]
[Thread 0x2aab3aac2700 (LWP 102886) exited]
[Thread 0x2aab3a8c1700 (LWP 102885) exited]
[Thread 0x2aab3a6c0700 (LWP 102884) exited]
[Thread 0x2aab3a4bf700 (LWP 102883) exited]
[Thread 0x2aab3a2be700 (LWP 102882) exited]
[Thread 0x2aab3a0bd700 (LWP 102881) exited]
[Thread 0x2aab39ebc700 (LWP 102880) exited]
[Thread 0x2aab39cbb700 (LWP 102879) exited]
[Thread 0x2aab39aba700 (LWP 102878) exited]
[Thread 0x2aab398b9700 (LWP 102877) exited]
[Thread 0x2aab396b8700 (LWP 102876) exited]
[Thread 0x2aab394b7700 (LWP 102875) exited]
[Thread 0x2aab392b6700 (LWP 102874) exited]
[Thread 0x2aab390b5700 (LWP 102873) exited]
[Thread 0x2aab38eb4700 (LWP 102872) exited]
[New Thread 0x2aab3cad2700 (LWP 102903)]
2022-03-16 11:34:13,444 INFO [decode.py:541] Decoding started
2022-03-16 11:34:13,444 INFO [decode.py:542] {'subsampling_factor': 4, 'vgg_frontend': False, 'use_feat_batchnorm': True, 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'num_decoder_layers': 6, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.13', 'k2-build-type': 'Debug', 'k2-with-cuda': True, 'k2-git-sha1': '854b792368214a2adb4e89cd83f6bc09ddbbcdae', 'k2-git-date': 'Sat Feb 19 22:22:31 2022', 'lhotse-version': '1.0.0.dev+git.449ce44.clean', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'spgi', 'icefall-git-sha1': '0c27ba4-dirty', 'icefall-git-date': 'Tue Mar 8 15:01:58 2022', 'icefall-path': '/export/c07/draj/mini_scale_2022/icefall', 'k2-path': '/export/c07/draj/mini_scale_2022/k2/k2/python/k2/__init__.py', 'lhotse-path': '/export/c07/draj/mini_scale_2022/lhotse/lhotse/__init__.py', 'hostname': 'c23', 'IP address': '127.0.0.1'}, 'epoch': 12, 'avg': 3, 'method': 'ctc-decoding', 'num_paths': 5, 'nbest_scale': 0.5, 'exp_dir': PosixPath('conformer_ctc/exp'), 'lang_dir': PosixPath('data/lang_bpe_5000'), 'lm_dir': PosixPath('data/lm'), 'manifest_dir': PosixPath('data/manifests'), 'enable_musan': True, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'max_duration': 20, 'num_buckets': 30, 'on_the_fly_feats': False, 'shuffle': True, 'num_workers': 8, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80}
2022-03-16 11:34:13,623 INFO [lexicon.py:176] Loading pre-compiled data/lang_bpe_5000/Linv.pt
2022-03-16 11:34:14,012 INFO [decode.py:552] device: cuda:0
[New Thread 0x2aab3c8d1700 (LWP 102919)]
[New Thread 0x2aab3c6d0700 (LWP 102920)]
2022-03-16 11:35:38,582 INFO [decode.py:650] averaging ['conformer_ctc/exp/epoch-10.pt', 'conformer_ctc/exp/epoch-11.pt', 'conformer_ctc/exp/epoch-12.pt']
2022-03-16 11:43:07,298 INFO [decode.py:657] Number of model parameters: 116147120
2022-03-16 11:43:07,298 INFO [asr_datamodule.py:295] About to get SPGISpeech dev cuts
2022-03-16 11:43:07,634 INFO [asr_datamodule.py:300] About to get SPGISpeech val cuts
[New Thread 0x2aab3c4cf700 (LWP 104230)]
[New Thread 0x2aab3c2ce700 (LWP 104231)]
[New Thread 0x2aab3c0cd700 (LWP 104232)]
[New Thread 0x2aab3becc700 (LWP 104233)]
[New Thread 0x2aab3bccb700 (LWP 104234)]
[New Thread 0x2aab3baca700 (LWP 104235)]
[New Thread 0x2aab3b8c9700 (LWP 104236)]
[New Thread 0x2aab3b6c8700 (LWP 104237)]
[New Thread 0x2aab3b4c7700 (LWP 104269)]
[New Thread 0x2aab3b2c6700 (LWP 104270)]
[F] /export/c07/draj/mini_scale_2022/k2/k2/csrc/ragged.cu:116:k2::Array1<int>& k2::RaggedShape::RowIds(int32_t) Check failed: axis < NumAxes() (1 vs. -472640871)
[ Stack-Trace: ]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x2aab34ac37c8]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::internal::Logger::~Logger()+0x35) [0x2aab3370a307]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::RaggedShape::RowIds(int)+0x29e) [0x2aab33890fd8]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::PropagateBackward(int, k2::MultiGraphDenseIntersectPruned::FrameInfo*, k2::MultiGraphDenseIntersectPruned::FrameInfo*, k2::Array1<char>*, k2::Array1<char>*)+0x777) [0x2aab338695cd]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::PruneTimeRange(int, int)+0x58a) [0x2aab3386a67a]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::BackwardPass()+0x18a) [0x2aab338656ca]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::BackwardPassStatic(k2::MultiGraphDenseIntersectPruned*)+0x4b) [0x2aab338657d7]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::MultiGraphDenseIntersectPruned::Intersect()::{lambda()#1}::operator()() const+0x1b) [0x2aab33864f45]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(std::_Function_handler<void (), k2::MultiGraphDenseIntersectPruned::Intersect()::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x20) [0x2aab3387d9ca]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(std::function<void ()>::operator()() const+0x32) [0x2aab339a03c6]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(k2::ThreadPool::ProcessTasks()+0x11e) [0x2aab3399f67e]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(+0x4481a3) [0x2aab3399f1a3]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(+0x448f4a) [0x2aab3399ff4a]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(+0x448ee7) [0x2aab3399fee7]
/export/c07/draj/mini_scale_2022/k2/build_debug/lib/libk2context.so(+0x448ec6) [0x2aab3399fec6]
/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xc9039) [0x2aaac4ae5039]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x2aaaab3e34a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x2aaaab6e1d0f]
terminate called after throwing an instance of 'std::runtime_error'
what():
Some bad things happened. Please read the above error messages and stack
trace. If you are using Python, the following command may be helpful:
gdb --args python /path/to/your/code.py
(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.).
If you are unable to fix it, please open an issue at:
https://github.com/k2-fsa/k2/issues/new
Thread 44 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x2aab3b4c7700 (LWP 104269)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
I trained the model with 5000 BPE tokens, which I later realized is quite large. Due to this, I have to use a very small --max-duration in decoding. Is it possible that the error is because of this?
As you are using ctc decoding with vocab size 5000, you may want to change https://github.com/k2-fsa/icefall/blob/518ec6414a676ec0ce583d4e728ea010efc7e2aa/egs/librispeech/ASR/conformer_ctc/decode.py#L571
H = k2.ctc_topo(
max_token=max_token_id,
modified=False,
device=device,
)
to
H = k2.ctc_topo(
max_token=max_token_id,
modified=True,
device=device,
)
Otherwise, the resulting H is very large.
Thanks @csukuangfj. It works now with the modified CTC topology.