GloVe icon indicating copy to clipboard operation
GloVe copied to clipboard

Vocab_count accounts for empty strings

Open tarekeldeeb opened this issue 6 years ago • 1 comments

This particular line may hash empty strings while (fscanf(fid, format, str) != EOF) { // Insert all tokens into hashtable I found the following as a result in my vocab.txt

Arabic_Word 903 singleSpace 903 Arabic_Word 902

This empty string was the root cause to many further problems with coccur and GloVe. All proportional vectors beneath the empty line did not start with the string from vocab.txt but rather started with the number of occurrences! This simply ruins the vectors.txt.

tarekeldeeb avatar Jun 20 '18 17:06 tarekeldeeb

@tarekeldeeb Thank you! I copied that commit change into the original GloVe on my machine and it solved this issue for me :)

devincornell avatar Sep 19 '19 20:09 devincornell