Radim Řehůřek
Radim Řehůřek
> The top downloads, like `glove-wiki-gigaword-100.gz` (209k downloads) or `word2vec-google-news-300.gz` (220k) are also references in Gensim's code, like doc-comment examples & `auto_examples` notebooks – that, IIUC, is auto-run regularly by...
> > I hope not! That's a >1 GB download. > > Isn't doc-comment example code like this... > > https://github.com/RaRe-Technologies/gensim/blob/b287fd841c31d0dfa899d784da0bd5b3669e104d/gensim/models/word2vec.py#L163 > > ..auto-run in testing? That makes me think...
And what determines whether something gets run or not?
OK, thanks. That's what I thought too – we check docs for syntax, at most. We don't actually run the code there. (invalidating the theory of "bogus download counts driven...
> other projects' automated fetches may be the overwhelming bulk of other downloads. That is of course possible. We have no control / reliable stats over who or why downloads...
Thanks @osanseviero ! I'm +1 on supporting external data storages in general, including from Hugging Face Hub. Seeing as there hasn't been much activity on our `gensim-data` storage (none of...
Changing how the `gensim-data` models and corpora are uploaded / maintained is definitely possible – pending a strong contributor to actually implement it. If I understand correctly, the options now...
The entire function looks strange. Its documentation doesn't match the code comments: `Return a scipy.sparse vector/matrix consisting of 'topn' elements of the greatest magnitude (absolute value).` versus `# Sort and...
If so, it needs a clear description + motivation. I still find the docstrings confusing (and contradictory to code), and adding extra parameters won't help.
IIRC the issue was with segfaults: there's a contract in Gensim that sparse vectors (both BoW lists and scipy.sparse) don't contain explicit zeros. Breaking the contract made scipy.sparse confused, and...