wav2letter Not recognising first short word

I am using CTC-Transformer architecture for English and Hindi speech.

For example it is missing:

I
Show
Add
Get
What
How
Hi I

Dataset: About 35k of English/Hindi short sentences Using word piece 5k

Configuration: --am=/w2l-libri-local/sota-livai/am_transformer_seq2seq_livai-10july12pm/120/001_model_last.bin --tokensdir=/w2l-libri-local/sota-livai/am --tokens=livai-train-all-unigram-5000.tokens --lexicon=/w2l-libri-local/sota-livai/am/livai-train+dev-unigram-5000-nbest10.lexicon --lm=/w2l-libri-local/sota-livai/decoder/ngram_models/grocery_3gram-lm.binary --datadir=/w2l-libri-local/sota-livai/lists --test=test-grocery-s2r-clean-en-2k.lst --uselexicon=true --sclite=sclite_livai_decode_transformer_ctc_ngram_other-dev29-75iter_model_dev-other-11july --decodertype=wrd --lmtype=kenlm --silscore=0 --beamsize=500 --beamsizetoken=100 --beamthreshold=100 --nthread_decoder=4 --smearing=max --show --showletters --lmweight=0.6 --wordscore=1.4710204244326

Jul 11 '20 20:07 omprakashsonie

Am I understand that it missing words with upper-case? or do you have all lower-cased words?

Jul 12 '20 06:07 tlikhomanenko

Thanks a lot Tatiana, It is in Hindi langauge, I have provided few examples:

for I: I is equal to अाई (first word, marked in bold) |T|: अाई नीड टू बाय मेक अप कलर किक काजल पेंसिल |P|: नीड टू बाय मी काजल पेंसिल |t|: अ ा ई _ न ी ड _ ट ू _ ब ा य _ म े क _ अ प _ क ल र _ क ि क _ क ा ज ल _ प े ं स ि ल |p|: न ी ड _ ट ू _ ब ा य _ म ी _ क ा ज ल _ प े ं स ि ल

|T|: अाई वान्ट टू रिमूव दी आइटम नंबर सेकंड फ्रॉम माई कार्ट |P|: वांट टू रिमूव दी आइटम नंबर सेकंड फ्रॉम माई कार्ट |t|: अ ा ई _ व ा न ् ट _ ट ू _ र ि म ू व _ द ी _ आ इ ट म _ न ं ब र _ स े क ं ड _ फ ् र ॉ म _ म ा ई _ क ा र ् ट |p|: व ा ं ट _ ट ू _ र ि म ू व _ द ी _ आ इ ट म _ न ं ब र _ स े क ं ड _ फ ् र ॉ म _ म ा ई _ क ा र ् ट

for: Add Add = ऐड (first word marked bold)

|T|: ऐड क्रीमी चीज चिप्स इंटू माई कार्ट |P|: क्रीमी चीज चिप्स इंटू टू माई कार्ट |t|: ऐ ड _ क ् र ी म ी _ च ी ज _ च ि प ् स _ इ ं ट ू _ म ा ई _ क ा र ् ट |p|: क ् र ी म ी _ च ी ज _ च ि प ् स _ इ ं ट ू _ ट ू _ म ा ई _ क ा र ् ट

for show: show = शो (first word marked bold) |T|: शो मी ऑल बेबी सोप |P|: मी मी ऑल बेबी सोप |t|: श ो _ म ी _ ऑ ल _ ब े ब ी _ स ो प |p|: म ी _ ऑ ल _ ब े ब ी _ स ो प

Kindly let me know if you need more inputs.

Jul 12 '20 06:07 omprakashsonie

Could you post head of your tokens set and lexicon?

Jul 12 '20 17:07 tlikhomanenko

FWIW I've found wav2letter models perform better with short inputs if you train them on short inputs. My high-performing english models have been trained on hundreds of thousands of clips of single word length.

If you're dropping words from longer clips, that's often the language model's fault, or your training data alignment might be poor. Try with both Test and Decoder, and maybe a ZeroLM decode (--lm='')

Jul 12 '20 19:07 lunixbochs

Yep, so at first you can check Viterbi only with Test to see if the acoustic model itself have this problem. One thing is you are using lexicon-based decoder, but I guess it should be lexicon free. So here could be the problem of first word generation due to lexicon. Do you have this problem only with short words or for every first word in each sample?

Jul 12 '20 20:07 tlikhomanenko

Hi Tatiana, It is not for all short words. It is not for 1st word in each sample. See below correct predictions.

|T|: दस किलो मधुर शुगर |P|: दस किलो मधुर शुगर |t|: द स _ क ि ल ो _ म ध ु र _ श ु ग र |p|: द स _ क ि ल ो _ म ध ु र _ श ु ग र WER: 0%

|T|: बीस के जी का आटा |P|: बीस के जी का आटा |t|: ब ी स _ क े _ ज ी _ क ा _ आ ट ा |p|: ब ी स _ क े _ ज ी _ क ा _ आ ट ा WER: 0%

Here are tokens and lexicon: head train-all-unigram-5000.tokens _टू _अ ाई _के _है _ऐड _का _ऑफ _मी _कार्ट

tail train-all-unigram-5000.tokens ाऊ ृ ः ऋ ऱ ऍ औ ऐ ऑ ॉ

head train-unigram-5000-nbest10.lexicon अ _अ अ _ अ अंकल _अ ंकल अंकल _अ ंक ल अंकल _अ ं क ल अंकल _ अ ंकल अंकल _ अ ंक ल अंकल _ अ ं क ल अंकित _अ ंक ित अंकित _अ ं क ित

Jul 13 '20 15:07 omprakashsonie

Thanks a lot Ryan. Will try your suggestions.

Jul 13 '20 15:07 omprakashsonie

wav2letter wav2letter copied to clipboard

Not recognising first short word

wav2letter
wav2letter copied to clipboard