GloballyNormalizedReader
GloballyNormalizedReader copied to clipboard
"No such file" when running download.sh; "UnicodeDecodeError" when running featurize.py
Hi, when I ran the data/download.sh
script, the command
cat augmented_data/augmented_zips.zip.z* > augmented_train.json.zip
raised an error:
cat: augmented_data/augmented_zips.zip.z*: No such file or directory
I then changed augmented_zips.zip.z*
to
cat augmented_data/augmented_zips.z01 augmented_data/augmented_zips.z02 augmented_data/augmented_zips.z03 augmented_data/augmented_zips.z04 augmented_data/augmented_zips.z05 augmented_data/augmented_zips.z06 augmented_data/augmented_zips.z07 augmented_data/augmented_zips.z08 augmented_data/augmented_zips.z09 augmented_data/augmented_zips.z10 augmented_data/augmented_zips.zip > augmented_train.json.zip
But am unable to run featurize.py
successfully afterwards, encountering UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte
when running line 139 of vocab.py
(gensim.models.KeyedVectors.load_word2vec_format(path, binary=True)
) during "Building word embedding matrix..."
Is there any advice on what modifications I can make? Thanks!
I have the same problem, is there nobody up to fix this?
@maoredman The exact same error when running featurize.py
. Did you fix it?