Radim Řehůřek comments

Results 314 comments of


                                            Radim Řehůřek

Deprecation warnings: `scipy.sparse.sparsetools` and `np.float`

Thanks for trying the beta and reporting! I don't think we can do much about sparsetools (scipy provides no alternative AFAIK), but we can definitely fix the `float`.

Deprecation warnings: `scipy.sparse.sparsetools` and `np.float`

@raffaem can you run whatever steps you used before, but on the current `develop` branch of Gensim? We removed and fixed a bunch of code, so maybe this is not...

Deprecation warnings: `scipy.sparse.sparsetools` and `np.float`

Removing from 4.0.0. Will revisit when @raffaem follows up.

Deprecation warnings: `scipy.sparse.sparsetools` and `np.float`

Getting rid of (not showing) the sparsetools warning makes sense. But I don't think try-expect will help here – it's not an exception. As far as I'm aware scipy doesn't...

Deprecation warnings: `scipy.sparse.sparsetools` and `np.float`

And in https://github.com/scipy/scipy/issues/5348 by the scipy team. Scipy (and the whole pydata ecosystem) is much easier to deploy and manage than it was 7 years ago. Plus gensim now compiles...

Model.Phrases - Specify what is considered a MWE component/word

> In this way, `Phrases` will treat `European Comission` the same way it will treat `Comission There`. No – you pass in sentences (lists of tokens) to Phrases, not strings...

Model.Phrases - Specify what is considered a MWE component/word

I don't know about non-word tokens. But definitely on full stops, to avoid that example of `Commission There` cross-sentence overlap.

KeyedVector most_similar() use too much CPU

Does running your processes with `OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1` fix the problem?

Freezing Trigram Phrase models yields inconsistent results

Thanks for reporting. Are you interested in figuring out the cause? All code lives in the [phrases](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/phrases.py) module, and is fairly straightforward.

Freezing Trigram Phrase models yields inconsistent results

Thanks for looking into this. IIRC we went for strings to save on RAM, tuples introduce a lot memory overhead. These "phrases" models are memory-hungry, by the nature of what...