Radim Řehůřek

Results 314 comments of Radim Řehůřek

> The top downloads, like `glove-wiki-gigaword-100.gz` (209k downloads) or `word2vec-google-news-300.gz` (220k) are also references in Gensim's code, like doc-comment examples & `auto_examples` notebooks – that, IIUC, is auto-run regularly by...

> > I hope not! That's a >1 GB download. > > Isn't doc-comment example code like this... > > https://github.com/RaRe-Technologies/gensim/blob/b287fd841c31d0dfa899d784da0bd5b3669e104d/gensim/models/word2vec.py#L163 > > ..auto-run in testing? That makes me think...

And what determines whether something gets run or not?

OK, thanks. That's what I thought too – we check docs for syntax, at most. We don't actually run the code there. (invalidating the theory of "bogus download counts driven...

> other projects' automated fetches may be the overwhelming bulk of other downloads. That is of course possible. We have no control / reliable stats over who or why downloads...

Thanks @osanseviero ! I'm +1 on supporting external data storages in general, including from Hugging Face Hub. Seeing as there hasn't been much activity on our `gensim-data` storage (none of...

Changing how the `gensim-data` models and corpora are uploaded / maintained is definitely possible – pending a strong contributor to actually implement it. If I understand correctly, the options now...

The entire function looks strange. Its documentation doesn't match the code comments: `Return a scipy.sparse vector/matrix consisting of 'topn' elements of the greatest magnitude (absolute value).` versus `# Sort and...

If so, it needs a clear description + motivation. I still find the docstrings confusing (and contradictory to code), and adding extra parameters won't help.

IIRC the issue was with segfaults: there's a contract in Gensim that sparse vectors (both BoW lists and scipy.sparse) don't contain explicit zeros. Breaking the contract made scipy.sparse confused, and...