deepspeech.pytorch.ko icon indicating copy to clipboard operation
deepspeech.pytorch.ko copied to clipboard

usage

Open Ella77 opened this issue 5 years ago • 4 comments

Thank you for great work done

I was following on speech.ko repo to preprocess, deep speech.pytorch repo with preprocess, preparemetafile . I was wondering if this repo is independent to run with only Korean open data *.zip file (same data as speech.ko)

After run this repo just considering that raw zipped datasets, there are some $home/corpus or .txt path problems (can't find file) just running data/nikl.py I want to know directory structure that is running in this repo.

With prepared datasets, I wonder where to continue on this repo and which to skip. Also, can I have to implement the original deep speech.pytorch repo with Korean frontend process that has Korean-cleaners? Thanks!!

Ella77 avatar Jul 03 '19 02:07 Ella77

I had not made pull request because this was personal project. For the directory structure, the following will help you out.

https://github.com/homink/deepspeech.pytorch.ko/blob/c09b17925472551518c590fd0ac954f9d706728b/data/nikl.py#L12

homink avatar Jul 03 '19 06:07 homink

I had not made pull request because this was personal project. For the directory structure, the following will help you out.

https://github.com/homink/deepspeech.pytorch.ko/blob/c09b17925472551518c590fd0ac954f9d706728b/data/nikl.py#L12

Does that nikl_dataset mean raw_downloaded zip file or after process of preprocess.py in deep voice.pytorch ??

I have difficulty putting below $HOME/copora/NIKL directory .. Thanks! subprocess.call(["local/clean_corpus.sh","$HOME/copora/NIKL",args.target_dir]) subprocess.call(["local/data_prep.sh","$HOME/copora/NIKL",args.target_dir])

Ella77 avatar Jul 03 '19 06:07 Ella77

명령어를 입력하는 디렉토리 path가 어떻게 되세요?

homink avatar Jul 03 '19 22:07 homink

speech.ko ㄴzip파일들 ㄴtrimmed data ㄴmetadata.txt ㄴf101 ㄴf102

deepvoice.pytorch ㄴdata ㄴnikl.m ㄴmultimel.npy들 ㄴmultispec.npy들 _______________________________________기존 레포들에서 data들 trimmed_data와 npy만드는 부분 진행했습니다

deepspeechtorch.ko(current repo) ㄴdata ㄴlocal ㄴraw zip파일들 원래 받은 30대여성_.zip파일들을 다시 올려놓았습니다 local에서 clean_corpus.sh부분에

unzip하는 부분을 주석해제 하고 런했습니다 unzip후 inflating,mov 등이 speech.ko과정처럼 전처리가 되다가 $HOME/copora/NIKL 이부분을 찾지 못해 하위 metadata.txt, 등등을 못 찾는다고 합니다 #NIKL corpus consists of several zip files. #You can organize folders into your corpus directory with the following commands unzip '.zip' mv -f "3-3(50female)"/ ./ mv -f "3-3(50male)"/* ./ rm -rf "3-3(50female)" "3-3(50male)" #You can delete corpus with the following comand and unzip again if necessary. rm -rf Bad* Non* f* m* .txt .hwp script speak

Ella77 avatar Jul 04 '19 02:07 Ella77