Andrew DalPino comments

Results 132 comments of


                                            Andrew DalPino

trafficstars

How should we do something like More Like This (MLT) in Rubix ML?

> What if I want to train an existing model and then just query against it in runtime? I suppose BallTree cannot be persisted with existing persisters and I'd need...

How should we do something like More Like This (MLT) in Rubix ML?

Just a heads up I've added a BM25 transformer to the [Extras](https://github.com/RubixML/Extras) package that you can try out as well. This *should* be an improvement over TF-IDF for document retrieval....

How should we do something like More Like This (MLT) in Rubix ML?

So @kroky with the new [BM25 Transformer](https://github.com/RubixML/Extras/blob/master/docs/transformers/bm25-transformer.md) we can replicate 2 of Lucene's search strategies. The first is their BM25 method which we replicate by using the new BM25 Transformer...

How should we do something like More Like This (MLT) in Rubix ML?

Also @kroky just a heads up so you don't spend hours scratching your head like I did ... things get a little weird with Cosine distance and zero vectors (norm...

How should we do something like More Like This (MLT) in Rubix ML?

> I also found it is faster than TFIDF, not sure why - considerably faster for small corpus size and slightly faster for bigger ones. It could be that the...

How should we do something like More Like This (MLT) in Rubix ML?

Hey @kroky there was an issue with the benchmark but it's fixed now. I guess it wasn't calling the setUp() method to instantiate the kernel. Green is your original optimization,...

How should we do something like More Like This (MLT) in Rubix ML?

Also, you may find this useful. I experimented with adding dimensionality reduction to the features. Went from 10,000 to 500 features with hardly any loss in "relevancy". It does take...

How should we do something like More Like This (MLT) in Rubix ML?

Ok @kroky, the fix will be out in 0.1.5 then (I went with the blue one) ... I really liked your implementation though (clever and elegant), I'm bummed it didn't...

How should we do something like More Like This (MLT) in Rubix ML?

A couple more things @kroky - just added to the [Extras](https://github.com/RubixML/Extras) repo is the new [Token Hashing Vectorizer](https://github.com/RubixML/Extras/blob/master/docs/transformers/token-hashing-vectorizer.md) that works well for low memory footprint applications. It doesn't build a...

How should we do something like More Like This (MLT) in Rubix ML?

Just saw this now @kroky thanks for the PR, we also found similar bugs in a few other trees as well thanks to your excellent debugging skills