sense2vec icon indicating copy to clipboard operation
sense2vec copied to clipboard

error runing 03_glove_build_counts.py

Open myeghaneh opened this issue 4 years ago • 3 comments

I followed your step to train my own S2V for my corpus on my customized NER model, thill step 2 everything is fine,.

corpusMODELV05.spacy is made and also corpusMODELV05-1.s2v

but in step 3 I faced with this error

ℹ Using 1 input files
✔ Created output directory data/S2VVocabMODELV05
ℹ Creating vocabulary counts
cat data\S2vcorpusMODELV05\corpusMODELV05-1.s2v | data/glove.6B.200d.txt/vocab_count -min-count 5 -verbose 2 > data\S2VVocabMODELV05\vocab.txt

✘ Failed creating vocab counts

I am working on Win 10 machine and have used this version of the glove

Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, & 300d vectors, 822 MB download): glove.6B.zip

https://nlp.stanford.edu/projects/glove/

it seems the number of VOC in

glove.6B.200d.txt/vocab_count is not in line with something

can someone help me ?

many thanks in advance

myeghaneh avatar Jul 01 '21 15:07 myeghaneh

any idea? :)

myeghaneh avatar Jul 02 '21 09:07 myeghaneh

I am facing the same issue. Let me know if you've been able to solve it.

saimmehmood avatar Nov 23 '21 07:11 saimmehmood

To run scripts/03_glove_build_counts.py successfully, make sure you do the following and pass the correct build folder of GloVe:

  1. Verify you have the submodule of GloVe (git submodule add https://github.com/stanfordnlp/GloVe.git)
  2. Build it by running cd GloVe && make, which will make a GloVe/build directory. Go back to the parent directory (cd ..).
  3. For 03_glove_build_counts.py GloVe directory path you pass the build folder GloVe/build as follows: python scripts/03_glove_build_counts.py GloVe/build source_folder output_folder

This is basically described in the script 03_glove_build_counts.py line 20-28 comments.

agonzalezreyes avatar Oct 20 '22 20:10 agonzalezreyes