End-to-end-ASR-Pytorch I met Problem when running preprocess

FileNotFoundError: [Errno 2] No such file or directory: 'spm_train': 'spm_train' I don't konw how to deal with this.There doesn't exist such file in current directory.

Jun 06 '19 02:06 qycxzhangyi

Same here.

Jun 11 '19 02:06 zxybazh

Same here.

Do you find the solution?

Jun 12 '19 07:06 qycxzhangyi

@qycxzhangyi @zxybazh I believe this is because sentencepiece needs to be compiled from C++ source using bazel and not as a PIP package. Which is unfortunate, because I wasn't able to install it correctly, see this issue.

Have you found a way to install and run correctly?

Jun 18 '19 12:06 Kabur

preprocess_libri.py now works, by installing an older commit of sentencepiece

Jun 19 '19 14:06 Kabur

Hello, When I am running the code, I got the following error. I have installed the sentencepiece package (pip install sentencepiece) but it doesn't work. Does it work for you? or have you found anything else useful?

Jun 20 '19 21:06 Mohammadelc

As I wrote, you need to install and compile sentencepiece from source, not with pip. See my comment above, that worked for me.

Jun 20 '19 22:06 Kabur

@Kabur Thank you, it works now.

Jun 21 '19 16:06 Mohammadelc

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem?

Jun 24 '19 02:06 qycxzhangyi

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add --model_type=bpe to call of spm_train

Jul 17 '19 09:07 sergeevii123

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add --model_type=bpe to call of spm_train

Can you please be more explicit on where I should put this flag --model_type=bpe ?

Edit: OK, I found the spm_train call. Also created a pull request to fix this. #34

Jul 30 '19 09:07 DerekChia

I added --model_type=bpe , but I still got the same error:

FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'

Aug 02 '19 10:08 JamiePlur

I added --model_type=bpe , but I still got the same error:

FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'

Did you do take a look at my pull request? If you did the same as my pull request, you should not have this problem.

Aug 03 '19 07:08 DerekChia

Found this error. But it works now when I compiled the sentencepiece from source. Don't do a pip install sentencepiece and this should work or an error

(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/

Pretrain BPE for subword unit.
Data sets :
         0 : train-clean-100
         1 : train-clean-360
         2 : train-other-500
         3 : dev-clean
         4 : dev-other
         5 : test-clean
         6 : test-other
Please enter the index for training sets for BPE (seperate w/ space): 0 3 5
Traceback (most recent call last):
  File "preprocess_libri.py", line 73, in <module>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
  File "preprocess_libri.py", line 73, in <listcomp>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
ValueError: invalid literal for int() with base 10: ''

Aug 10 '19 11:08 dendisuhubdy

Found this error. But it works now when I compiled the sentencepiece from source. Don't do a pip install sentencepiece and this should work or an error

(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/

Pretrain BPE for subword unit.
Data sets :
         0 : train-clean-100
         1 : train-clean-360
         2 : train-other-500
         3 : dev-clean
         4 : dev-other
         5 : test-clean
         6 : test-other
Please enter the index for training sets for BPE (seperate w/ space): 0 3 5
Traceback (most recent call last):
  File "preprocess_libri.py", line 73, in <module>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
  File "preprocess_libri.py", line 73, in <listcomp>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
ValueError: invalid literal for int() with base 10: ''

oh yes, forgot to mention I also compiled from source.

Aug 10 '19 15:08 DerekChia

Ok so for this Traceback (most recent call last): File "preprocess_libri.py", line 73, in <module> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] File "preprocess_libri.py", line 73, in <listcomp> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] ValueError: invalid literal for int() with base 10: '' it will happen nevertheless if you haven't finished preprocessing. So it's not because of an error in the script.

Aug 10 '19 16:08 dendisuhubdy

End-to-end-ASR-Pytorch
End-to-end-ASR-Pytorch copied to clipboard

I met Problem when running preprocess_libri.py

End-to-end-ASR-Pytorch End-to-end-ASR-Pytorch copied to clipboard

I met Problem when running preprocess_libri.py

End-to-end-ASR-Pytorch
End-to-end-ASR-Pytorch copied to clipboard