Randy Olson

Results 92 comments of Randy Olson

[This repo](https://github.com/rhiever/sklearn-benchmarks) might be a useful resource to pull code from. We've been running sklearn benchmarks over there and published the results on sklearn classifiers in [this paper](https://arxiv.org/abs/1708.05070). You can...

None of the datasets in PMLB have had a one-hot encoding applied. All datasets have had a LabelEncoder applied to columns with non-numeric values.

One concern with the benchmark is that no parameter tuning is performed. One finding from our [recent sklearn benchmarking paper](https://arxiv.org/abs/1708.05070) is that the sklearn defaults are almost always bad, and...

The parameters recommended in Table 4 are a fine starting point, but as we suggest in the paper, algorithm parameter tuning (even a small grid search) should always be performed...

Here are box plots of the results grouped just by encoder. Across the board, BinaryEncoder & OneHotEncoder seem to be the top-performing encoders, although there may not be statistically significant...

Here's the results grouped by encoder + classifier. ```python %matplotlib inline import matplotlib.pyplot as plt import seaborn as sb import pandas as pd results_df = pd.read_csv('https://raw.githubusercontent.com/janmotl/categorical-encoding/binary/examples/benchmarking_large/output/result_2018-08-09.csv') plt.figure(figsize=(12, 12)) for index,...

And here's grouping the other way around. ```python %matplotlib inline import matplotlib.pyplot as plt import seaborn as sb import pandas as pd results_df = pd.read_csv('https://raw.githubusercontent.com/janmotl/categorical-encoding/binary/examples/benchmarking_large/output/result_2018-08-09.csv') plt.figure(figsize=(12, 12)) for index, clf...

Sounds like this is a bug. Would you be willing to write a patch for it?

Please do! Probably the best starting point is to write a minimal example that reproduces the error, then that will stand as our first unit test for this patch.