End-to-end-ASR-Pytorch icon indicating copy to clipboard operation
End-to-end-ASR-Pytorch copied to clipboard

I met Problem when running preprocess_libri.py

Open qycxzhangyi opened this issue 5 years ago • 15 comments

FileNotFoundError: [Errno 2] No such file or directory: 'spm_train': 'spm_train' I don't konw how to deal with this.There doesn't exist such file in current directory.

qycxzhangyi avatar Jun 06 '19 02:06 qycxzhangyi

Same here.

zxybazh avatar Jun 11 '19 02:06 zxybazh

Same here.

Do you find the solution?

qycxzhangyi avatar Jun 12 '19 07:06 qycxzhangyi

@qycxzhangyi @zxybazh I believe this is because sentencepiece needs to be compiled from C++ source using bazel and not as a PIP package. Which is unfortunate, because I wasn't able to install it correctly, see this issue.

Have you found a way to install and run correctly?

Kabur avatar Jun 18 '19 12:06 Kabur

preprocess_libri.py now works, by installing an older commit of sentencepiece

Kabur avatar Jun 19 '19 14:06 Kabur

Hello, When I am running the code, I got the following error. I have installed the sentencepiece package (pip install sentencepiece) but it doesn't work. Does it work for you? or have you found anything else useful?

image

Mohammadelc avatar Jun 20 '19 21:06 Mohammadelc

As I wrote, you need to install and compile sentencepiece from source, not with pip. See my comment above, that worked for me.

Kabur avatar Jun 20 '19 22:06 Kabur

@Kabur Thank you, it works now.

Mohammadelc avatar Jun 21 '19 16:06 Mohammadelc

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem?

qycxzhangyi avatar Jun 24 '19 02:06 qycxzhangyi

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add --model_type=bpe to call of spm_train

sergeevii123 avatar Jul 17 '19 09:07 sergeevii123

@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add --model_type=bpe to call of spm_train

Can you please be more explicit on where I should put this flag --model_type=bpe ?

Edit: OK, I found the spm_train call. Also created a pull request to fix this. #34

DerekChia avatar Jul 30 '19 09:07 DerekChia

I added --model_type=bpe , but I still got the same error:

FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'

JamiePlur avatar Aug 02 '19 10:08 JamiePlur

I added --model_type=bpe , but I still got the same error:

FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'

Did you do take a look at my pull request? If you did the same as my pull request, you should not have this problem.

DerekChia avatar Aug 03 '19 07:08 DerekChia

Found this error. But it works now when I compiled the sentencepiece from source. Don't do a pip install sentencepiece and this should work or an error

(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/

Pretrain BPE for subword unit.
Data sets :
         0 : train-clean-100
         1 : train-clean-360
         2 : train-other-500
         3 : dev-clean
         4 : dev-other
         5 : test-clean
         6 : test-other
Please enter the index for training sets for BPE (seperate w/ space): 0 3 5
Traceback (most recent call last):
  File "preprocess_libri.py", line 73, in <module>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
  File "preprocess_libri.py", line 73, in <listcomp>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
ValueError: invalid literal for int() with base 10: ''

dendisuhubdy avatar Aug 10 '19 11:08 dendisuhubdy

Found this error. But it works now when I compiled the sentencepiece from source. Don't do a pip install sentencepiece and this should work or an error

(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/

Pretrain BPE for subword unit.
Data sets :
         0 : train-clean-100
         1 : train-clean-360
         2 : train-other-500
         3 : dev-clean
         4 : dev-other
         5 : test-clean
         6 : test-other
Please enter the index for training sets for BPE (seperate w/ space): 0 3 5
Traceback (most recent call last):
  File "preprocess_libri.py", line 73, in <module>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
  File "preprocess_libri.py", line 73, in <listcomp>
    bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
ValueError: invalid literal for int() with base 10: ''

oh yes, forgot to mention I also compiled from source.

DerekChia avatar Aug 10 '19 15:08 DerekChia

Ok so for this Traceback (most recent call last): File "preprocess_libri.py", line 73, in <module> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] File "preprocess_libri.py", line 73, in <listcomp> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] ValueError: invalid literal for int() with base 10: '' it will happen nevertheless if you haven't finished preprocessing. So it's not because of an error in the script.

dendisuhubdy avatar Aug 10 '19 16:08 dendisuhubdy