Radim Řehůřek
Radim Řehůřek
Pity this didn't get in for Gensim 4.0.0. Now we're stuck supporting both, even if we deprecate `sentences`. My preference would be nr. 1 `corpus` (simple, standard), nr. 2 `corpus_iterable`...
Hmm, how did we all miss this? We had several review cycles, test releases, how come no one spotted this :( In Word2Vec and FastText no less, Gensim's most exposed...
Thanks. @gojomo can you have a look?
Interesting idea. The costly part of `most_similar` is the dot product of two matrices: `(query_vecs, dim)` x `(dim, index_vecs)`. Having a vector of magnitudes for `index_vecs` stored separately (to rescale...
I don't see any slow down – it's the exact same `dot` operation. The extra rescaling step `cossim(q, raw_index) = dot(q, raw_index) / norm(raw_index)` is cheap in comparison, when `q`...
Naming of variables TBD (in #2698 in general, when it's ready for review), but this `magnitudes` array is such a neat and simple idea I cannot believe we didn't think...
CC @ERijck are you able to continue and finish this up? All the points above, plus all the `FIXME` notes I left in the code, must be resolved if we...
Finishing up 1, 3 and 4 will be a great start. I can then assist with 2 (input streaming), to bring flsamodel in line with the rest of Gensim.
Yeah `git` can be frustrating when you're starting out. Probably best to discard any existing mess in your local fork and start fresh: ```bash git checkout develop && git fetch...
Gensim itself has a strong copy left license too – LGPL. I'm afraid freeloading corporate concerns are not our primary motivator when choosing dependencies. We offer a commercial (paid) dual...