ctcdecode icon indicating copy to clipboard operation
ctcdecode copied to clipboard

decode.cpp fatal error at vocab size assertion

Open andrster1 opened this issue 5 years ago • 6 comments

Hello, I've been passing my arguments at decoder.decode as shown at test.py, with softmaxed probabilities with shape [batch_size, timesteps, classes] and the classes just like in the example. In my case I have 1232 classes including blank.

[ctcdecode/src/ctc_beam_search_decoder.cpp:32] FATAL: "(probs_seq[i].size()) == (vocabulary.size())" check failed. The shape of probs_seq does not match with the shape of the vocabulary

my Pytorch version is 1.1.0 and for C++ libraries I have gcc 5.

andrster1 avatar Aug 30 '19 14:08 andrster1

Did you solve this? Im having the exact same issue.

jpdevicente avatar Oct 09 '19 19:10 jpdevicente

Did you solve this? Im having the exact same issue.

Lip136 avatar Nov 28 '19 06:11 Lip136

Did you solve this? I am having the exact same issue with the probably exact same dataset.

I3orn2FLY avatar Feb 05 '20 08:02 I3orn2FLY

Did you solve this? Im having the exact same issue.

CSLujunyu avatar Apr 21 '21 02:04 CSLujunyu

Any updates?

doneforaiur avatar Oct 25 '21 10:10 doneforaiur

Workaround: you could comment out the check in /ctcdecode/src/ctc_beam_search_decoder.cpp at 62-64 lines and reinstalling ctcdecode with python install ..

I don't know the implications but since my vocab's size is 32 and the probs' size is 34 it's not a huge concern for me. I would suggest this temporary workaround if your circumstances are similar to mine.

Note: CTCDecode works fine with one or more models from Huggingface but not necessarily with all the models. I can't get it work with my custom wav2vec2 model, but it works fine with m3hrdadfi/wav2vec2-large-xlsr-turkish.

doneforaiur avatar Nov 05 '21 07:11 doneforaiur