GloVe
GloVe copied to clipboard
Vocab_count accounts for empty strings
This particular line may hash empty strings
while (fscanf(fid, format, str) != EOF) { // Insert all tokens into hashtable
I found the following as a result in my vocab.txt
Arabic_Word 903 singleSpace 903 Arabic_Word 902
This empty string was the root cause to many further problems with coccur and GloVe. All proportional vectors beneath the empty line did not start with the string from vocab.txt but rather started with the number of occurrences! This simply ruins the vectors.txt.
@tarekeldeeb Thank you! I copied that commit change into the original GloVe on my machine and it solved this issue for me :)