Chris de Vries
Chris de Vries
TF-IDF BM25 - probably quite useful with reflexive random indexing because it preserves the inner product space where BM25 works well Log Likelihood from TopSig paper
Some lower level unit tests for vector types and other concepts would also be useful.
Switch to follow and jemalloc to replace std::string and std::vector. Might be faster. Introduce string and vector types in the lmwtree namespace. Use likely and unlikely where useful.
https://code.google.com/p/semanticvectors/wiki/ReflectiveRandomIndexing However, no binary vector version exists yet.
http://arma.sourceforge.net/
Using smart pointers for cases where the pointer overhead is not critical such as tree nodes. Switch to move semantics for vectors.
There is a TODO in the code for this.