Randy Olson comments

Results 92 comments of


                                            Randy Olson

Large scale benchmark

[This repo](https://github.com/rhiever/sklearn-benchmarks) might be a useful resource to pull code from. We've been running sklearn benchmarks over there and published the results on sklearn classifiers in [this paper](https://arxiv.org/abs/1708.05070). You can...

Large scale benchmark

None of the datasets in PMLB have had a one-hot encoding applied. All datasets have had a LabelEncoder applied to columns with non-numeric values.

Large scale benchmark

One concern with the benchmark is that no parameter tuning is performed. One finding from our [recent sklearn benchmarking paper](https://arxiv.org/abs/1708.05070) is that the sklearn defaults are almost always bad, and...

Large scale benchmark

The parameters recommended in Table 4 are a fine starting point, but as we suggest in the paper, algorithm parameter tuning (even a small grid search) should always be performed...

Large scale benchmark

Here are box plots of the results grouped just by encoder. Across the board, BinaryEncoder & OneHotEncoder seem to be the top-performing encoders, although there may not be statistically significant...

Large scale benchmark

Here's the results grouped by encoder + classifier. ```python %matplotlib inline import matplotlib.pyplot as plt import seaborn as sb import pandas as pd results_df = pd.read_csv('https://raw.githubusercontent.com/janmotl/categorical-encoding/binary/examples/benchmarking_large/output/result_2018-08-09.csv') plt.figure(figsize=(12, 12)) for index,...

Large scale benchmark

And here's grouping the other way around. ```python %matplotlib inline import matplotlib.pyplot as plt import seaborn as sb import pandas as pd results_df = pd.read_csv('https://raw.githubusercontent.com/janmotl/categorical-encoding/binary/examples/benchmarking_large/output/result_2018-08-09.csv') plt.figure(figsize=(12, 12)) for index, clf...

Randy Olson

Large scale benchmark

Large scale benchmark

Large scale benchmark

Large scale benchmark

Large scale benchmark

Large scale benchmark

Large scale benchmark

ValueError instead of TypeError in Python 2.7

ValueError instead of TypeError in Python 2.7

ValueError instead of TypeError in Python 2.7