Andrew DalPino
Andrew DalPino
> What if I want to train an existing model and then just query against it in runtime? I suppose BallTree cannot be persisted with existing persisters and I'd need...
Just a heads up I've added a BM25 transformer to the [Extras](https://github.com/RubixML/Extras) package that you can try out as well. This *should* be an improvement over TF-IDF for document retrieval....
So @kroky with the new [BM25 Transformer](https://github.com/RubixML/Extras/blob/master/docs/transformers/bm25-transformer.md) we can replicate 2 of Lucene's search strategies. The first is their BM25 method which we replicate by using the new BM25 Transformer...
Also @kroky just a heads up so you don't spend hours scratching your head like I did ... things get a little weird with Cosine distance and zero vectors (norm...
> I also found it is faster than TFIDF, not sure why - considerably faster for small corpus size and slightly faster for bigger ones. It could be that the...
Hey @kroky there was an issue with the benchmark but it's fixed now. I guess it wasn't calling the setUp() method to instantiate the kernel. Green is your original optimization,...
Also, you may find this useful. I experimented with adding dimensionality reduction to the features. Went from 10,000 to 500 features with hardly any loss in "relevancy". It does take...
Ok @kroky, the fix will be out in 0.1.5 then (I went with the blue one) ... I really liked your implementation though (clever and elegant), I'm bummed it didn't...
A couple more things @kroky - just added to the [Extras](https://github.com/RubixML/Extras) repo is the new [Token Hashing Vectorizer](https://github.com/RubixML/Extras/blob/master/docs/transformers/token-hashing-vectorizer.md) that works well for low memory footprint applications. It doesn't build a...
Just saw this now @kroky thanks for the PR, we also found similar bugs in a few other trees as well thanks to your excellent debugging skills