ZeroSpeech icon indicating copy to clipboard operation
ZeroSpeech copied to clipboard

keyerror when preprocess data

Open liu-x-p opened this issue 4 years ago • 6 comments

I set the directory for data as datasets/2019/english, when I run the script preprocess.py, it raises
keyerror: 'accessing unknown key in a struct: dataset.in_dir' but I can't find how to solve it. Could you help me?

liu-x-p avatar Aug 10 '20 02:08 liu-x-p

Hi @liu-x-p,

Sure. If you look at the usage in the readme it says:

python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise]

Note: in_dir must be the path to the 2019 folder...

This is the folder that contains the wav in it's subdirectories. So, for example, if I download the ZeroSpeech 2020 dataset and store it at ~/Documents/ZeroSpeech/2020 the command should be:

python preprocess.py in_dir=~/Documents/ZeroSpeech/2020/2019 dataset=2019/english

If you're still having trouble you please post the command you use and the path to your data directory.

Hope that helps!

bshall avatar Aug 10 '20 08:08 bshall

@bshall Thank you! I followed your settings for the command python preprocess.py in_dir=/home/omnisky/mount/holiday/ZeroSpeech-0.1/datasets/2020/2019 dataset=2019/english and the path is /home/omnisky/mount/holiday/ZeroSpeech-0.1/datasets/2020/2019, it contains 'english' and 'surprise'.

liu-x-p avatar Aug 10 '20 13:08 liu-x-p

No problem @liu-x-p. If you're still having issues I'd advise keeping the actual data in a separate folder to this repo. So this repo would be under holiday/ZeroSpeech for example and the actual wav files would be stored in holiday/RawData/2020 for example. Then in_dir should point to .../holiday/RawData/2020/2019.

bshall avatar Aug 10 '20 16:08 bshall

On following the exact same procedure I am getting an error : hydra.errors.OverrideParseException: LexerNoViableAltException: Passport/VAE/ZeroSpeech/zerospeech_2020/2020/2019. Could you kindly help me out? The directory path to wav files is Passport/VAE/ZeroSpeech/zerospeech_2020/2020/2019 and to the json files is Passport/VAE/ZeroSpeech/zerospeech_2020/datasets/2019/english

dummy-arch avatar Jan 10 '21 12:01 dummy-arch

@liu-x-p Hi! I am also a Chinese student trying to run this repo and I am encountering some similar problems as you...TAT I wonder if you have successfully run this repo and could we have a discussion via e-mail... this is my email adress [email protected] Looking forward to your reply!

ZhengRachel avatar Mar 26 '21 02:03 ZhengRachel

@ZhengRachel I'm not sure about this as it has been so long time. As you can see in my question and comment, I got this problem when I downloaded this work as ZeroSpeech-0.1, which I think may be a early version. And I downloaded it again, the ZeroSpeech-master branch, then it worked. I think the command I used to run is python preprocess.py in_dir=../datasets/2020/2019 dataset=2019/english

liu-x-p avatar Mar 29 '21 01:03 liu-x-p