benchm-ml icon indicating copy to clipboard operation
benchm-ml copied to clipboard

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algor...

Results 12 benchm-ml issues
Sort by recently updated
recently updated
newest added

It seems to be a little bit confused that the evaluation on classification tasks uses the probabilities output directly in calculating the AUC. For example, in [6-xgboost.R#L39](https://github.com/szilard/benchm-ml/blob/4131bc45eed147c69e3ece16aacefaf5bd157af3/3-boosting/6-xgboost.R#L39), Will it be...

http://www.libfm.org/ Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. http://www.csie.ntu.edu.tw/~cjlin/libffm/ LIBFFM is an open source tool for field-aware factorization machines (FFM)....

This is to collaborate on some issues with Spark RF also addressed by @jkbradley in comments to this post http://datascience.la/benchmarking-random-forest-implementations/ (see comments by Joseph Bradley). cc: @mengxr Please see “Absolute...

Did you consider using more datasets? And how about regression problems? There is for example this benchmarking suite, accessible via the OpenML packages: [https://arxiv.org/abs/1708.03731](https://arxiv.org/abs/1708.03731)

As we've discussed in Slack, H2O has recently released some very interesting AutoML functionality. In this case, the leader is the StackedEnsemble generated from a GBM grid, a DL grid,...

Great initiative, thanks for making this public! You might be interested in extending your benchmarking to the auto-sklearn. https://github.com/automl/auto-sklearn I have created a script that can take in a sparse...

I know from @glouppe that "RFs in sklearn now support sparse matrices too" https://twitter.com/glouppe/status/660012865554903040 It would be interesting to see the results with sparse for RF and for logistic regression...

Hey Szilard, I'd like to replicate your code from beginning to end perhaps on Google Compute Engine (GCE), mainly to test out GCE with Vagrant. Do you know have a...

Thanks for great work! We have an open source machine learning library called SMILE (https://github.com/haifengl/smile). We have incorporated your benchmark (https://github.com/haifengl/smile/blob/master/benchmark/src/main/scala/smile/benchmark/Airline.scala). We found that our system is much faster for...

Motivation: I can't run mxnet on the 10M records airline set https://github.com/szilard/benchm-ml/issues/29 because `model.matrix` crashes out of RAM (on g2.8xlarge with 60GB or RAM - largest available for GPU instances)....