GloballyNormalizedReader icon indicating copy to clipboard operation
GloballyNormalizedReader copied to clipboard

"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py

Open maoredman opened this issue 7 years ago • 2 comments

Hi, when I ran the data/download.sh script, the command cat augmented_data/augmented_zips.zip.z* > augmented_train.json.zip raised an error: cat: augmented_data/augmented_zips.zip.z*: No such file or directory

I then changed augmented_zips.zip.z* to cat augmented_data/augmented_zips.z01 augmented_data/augmented_zips.z02 augmented_data/augmented_zips.z03 augmented_data/augmented_zips.z04 augmented_data/augmented_zips.z05 augmented_data/augmented_zips.z06 augmented_data/augmented_zips.z07 augmented_data/augmented_zips.z08 augmented_data/augmented_zips.z09 augmented_data/augmented_zips.z10 augmented_data/augmented_zips.zip > augmented_train.json.zip

But am unable to run featurize.py successfully afterwards, encountering UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte when running line 139 of vocab.py (gensim.models.KeyedVectors.load_word2vec_format(path, binary=True)) during "Building word embedding matrix..."

Is there any advice on what modifications I can make? Thanks!

maoredman avatar Feb 11 '18 08:02 maoredman

I have the same problem, is there nobody up to fix this?

Kotorinyanya avatar Apr 03 '18 08:04 Kotorinyanya

@maoredman The exact same error when running featurize.py. Did you fix it?

lan2720 avatar Apr 11 '18 05:04 lan2720