Andrew DalPino comments

Results 132 comments of


                                            Andrew DalPino

trafficstars

Implement Eigenvalues and EigenVectors

Hi @markrogoyski @Beakerboy Are there plans for implementing QR still? It would be SUPER helpful to me and I'm sure others in the case where I need eigenvalues and eigenvectors...

Simple spam filter (Naive Bayes)

Hey @GeorgeGardiner great question ... summarizing our convo from the Telegram Channel (https://t.me/RubixML) ... You can use the transformer pipeline from the [Sentiment example](https://github.com/RubixML/Sentiment) with [Gaussian Naive Bayes](https://docs.rubixml.com/en/latest/classifiers/gaussian-naive-bayes.html) under the...

What method is used to compute gini split index on the Classification and regression tree?

Hi @nizariyah thanks for the question [Classification Tree](https://docs.rubixml.com/en/latest/classifiers/classification-tree.html) minimizes Gini impurity and [Extra Tree Classifier](https://docs.rubixml.com/en/latest/classifiers/extra-tree-classifier.html) minimizes entropy at the leaf nodes of the tree. For regression, both [Regression Tree](https://docs.rubixml.com/en/latest/regressors/regression-tree.html) and...

What method is used to compute gini split index on the Classification and regression tree?

Hi @nizariyah The CART implementation has been tuned and optimized in the latest commit https://github.com/RubixML/RubixML/commit/89f6991794c9ee5e7a2f358c1cc52167450684a8 Instead of using a 0.25 ratio of quantiles to samples at the split node, we...

What method is used to compute gini split index on the Classification and regression tree?

Performance comparison of PHP library Random Forest on the Dota 2 dataset, 10 tree, 0.5 subsample ratio, categorical features ![random-forest-speed](https://user-images.githubusercontent.com/18690561/77395392-77631100-6d6f-11ea-8de3-fb105f602bac.png) ![random-forest-accuracy](https://user-images.githubusercontent.com/18690561/77396573-02450b00-6d72-11ea-81b6-8c7d3b81ac42.png)

How should we do something like More Like This (MLT) in Rubix ML?

Great question @marclaporte! I am not expert but I read the document you linked and it looks like they're performing some type of full-text search using TF-IDF (Term Frequency Inverse...

How should we do something like More Like This (MLT) in Rubix ML?

Hey @alaindesilets Yes you'd increase *k* with the number of similar documents you wanted retrieved and instead of the `predict()` API you'd use `proba()` to return an array of probabilities....

How should we do something like More Like This (MLT) in Rubix ML?

We added maximum document frequency to [Word Count Vectorizer](https://docs.rubixml.com/en/latest/transformers/word-count-vectorizer.html) in the last commit (https://github.com/RubixML/RubixML/commit/52033972b1193671b6e4c1a4d005918c5a825e27) so now you can narrow the vocabulary by removing frequent tokens similarly to Elasticsearch. **Example** ```php...

How should we do something like More Like This (MLT) in Rubix ML?

Great @marclaporte I already began doing some experiments on my end Victor and I will take it from there when ready

How should we do something like More Like This (MLT) in Rubix ML?

Here is what I came up with - it uses the positive reviews from the [Sentiment](https://github.com/RubixML/Sentiment) dataset. You should be able to drop this code right into the project (ex...