Andrew DalPino

Results 132 comments of Andrew DalPino
trafficstars

> What if I want to train an existing model and then just query against it in runtime? I suppose BallTree cannot be persisted with existing persisters and I'd need...

Just a heads up I've added a BM25 transformer to the [Extras](https://github.com/RubixML/Extras) package that you can try out as well. This *should* be an improvement over TF-IDF for document retrieval....

So @kroky with the new [BM25 Transformer](https://github.com/RubixML/Extras/blob/master/docs/transformers/bm25-transformer.md) we can replicate 2 of Lucene's search strategies. The first is their BM25 method which we replicate by using the new BM25 Transformer...

Also @kroky just a heads up so you don't spend hours scratching your head like I did ... things get a little weird with Cosine distance and zero vectors (norm...

> I also found it is faster than TFIDF, not sure why - considerably faster for small corpus size and slightly faster for bigger ones. It could be that the...

Hey @kroky there was an issue with the benchmark but it's fixed now. I guess it wasn't calling the setUp() method to instantiate the kernel. Green is your original optimization,...

Also, you may find this useful. I experimented with adding dimensionality reduction to the features. Went from 10,000 to 500 features with hardly any loss in "relevancy". It does take...

Ok @kroky, the fix will be out in 0.1.5 then (I went with the blue one) ... I really liked your implementation though (clever and elegant), I'm bummed it didn't...

A couple more things @kroky - just added to the [Extras](https://github.com/RubixML/Extras) repo is the new [Token Hashing Vectorizer](https://github.com/RubixML/Extras/blob/master/docs/transformers/token-hashing-vectorizer.md) that works well for low memory footprint applications. It doesn't build a...

Just saw this now @kroky thanks for the PR, we also found similar bugs in a few other trees as well thanks to your excellent debugging skills