End-to-end-ASR-Pytorch
End-to-end-ASR-Pytorch copied to clipboard
I met Problem when running preprocess_libri.py
FileNotFoundError: [Errno 2] No such file or directory: 'spm_train': 'spm_train' I don't konw how to deal with this.There doesn't exist such file in current directory.
Same here.
Same here.
Do you find the solution?
@qycxzhangyi @zxybazh I believe this is because sentencepiece needs to be compiled from C++ source using bazel and not as a PIP package. Which is unfortunate, because I wasn't able to install it correctly, see this issue.
Have you found a way to install and run correctly?
preprocess_libri.py now works, by installing an older commit of sentencepiece
Hello, When I am running the code, I got the following error. I have installed the sentencepiece package (pip install sentencepiece) but it doesn't work. Does it work for you? or have you found anything else useful?
As I wrote, you need to install and compile sentencepiece from source, not with pip. See my comment above, that worked for me.
@Kabur Thank you, it works now.
@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem?
@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add
--model_type=bpe
to call of spm_train
@Kabur I referenced the method you mentioned above but I meet a new problem. FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab' It did not generate bpe.vocab file .Do you know how to solve this problem? add
--model_type=bpe
to call of spm_train
Can you please be more explicit on where I should put this flag --model_type=bpe
?
Edit: OK, I found the spm_train
call. Also created a pull request to fix this. #34
I added --model_type=bpe , but I still got the same error:
FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'
I added --model_type=bpe , but I still got the same error:
FileNotFoundError: [Errno 2] No such file or directory: './libri_fbank80_subword5000/bpe/bpe.vocab'
Did you do take a look at my pull request? If you did the same as my pull request, you should not have this problem.
Found this error. But it works now when I compiled the sentencepiece from source. Don't do a pip install sentencepiece
and this should work or an error
(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/
Pretrain BPE for subword unit.
Data sets :
0 : train-clean-100
1 : train-clean-360
2 : train-other-500
3 : dev-clean
4 : dev-other
5 : test-clean
6 : test-other
Please enter the index for training sets for BPE (seperate w/ space): 0 3 5
Traceback (most recent call last):
File "preprocess_libri.py", line 73, in <module>
bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
File "preprocess_libri.py", line 73, in <listcomp>
bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')]
ValueError: invalid literal for int() with base 10: ''
Found this error. But it works now when I compiled the sentencepiece from source. Don't do a
pip install sentencepiece
and this should work or an error(pytorch_p36) $~/code/End-to-end-ASR-Pytorch/data$ python3 preprocess_libri.py --data_path ~/data/speech/LibriSpeech/ Pretrain BPE for subword unit. Data sets : 0 : train-clean-100 1 : train-clean-360 2 : train-other-500 3 : dev-clean 4 : dev-other 5 : test-clean 6 : test-other Please enter the index for training sets for BPE (seperate w/ space): 0 3 5 Traceback (most recent call last): File "preprocess_libri.py", line 73, in <module> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] File "preprocess_libri.py", line 73, in <listcomp> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] ValueError: invalid literal for int() with base 10: ''
oh yes, forgot to mention I also compiled from source.
Ok so for this Traceback (most recent call last): File "preprocess_libri.py", line 73, in <module> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] File "preprocess_libri.py", line 73, in <listcomp> bpe_tr = [sets[int(t)] for t in bpe_tr.split(' ')] ValueError: invalid literal for int() with base 10: ''
it will happen nevertheless if you haven't finished preprocessing. So it's not because of an error in the script.