FLAML Unexpected result from flaml.default.LGBMClassifier on iris

Unexpected result from flaml.default.LGBMClassifier on iris

Open amueller opened this issue 1 year ago • 2 comments

I'm trying to benchmark the zero-shot flaml.default.LGBMClassifier and I have seen some unexpected results. I'm working on Flaml 2.1.1.

 
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml.default import LGBMClassifier
 
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=.5)
lgbm = LGBMClassifier().fit(X_train, y_train)
lgbm.score(X_test, y_test)

produces a test score of 0.3, which is chance. Using the standard 75/25 split, I get an accuracy of .92, which is around the expected value. Using a random forest with scikit-learn defaults, I get .92 both for the 50/50 split in the example as well as for the 75/25 split. I assume there's an issue where a parameter configuration is chosen that doesn't allow growing a tree at all.

Oct 18 '23 04:10 amueller

From what I observed, the small dataset doesn't match quite close to existing datapoints, which failed to apply a nice hyperparameters combination to LGBM. I'll add this data as a datapoint to the lgbm default configs, to let KNN match a nice hyperparameters combination to this data. I'll raise a PR soon.

Oct 19 '23 05:10 levscaut

Thank you! I have a benchmark containing a lot of small datasets, I'd love to run it again once you added the point.

Oct 19 '23 17:10 amueller

FLAML FLAML copied to clipboard

Unexpected result from flaml.default.LGBMClassifier on iris

FLAML
FLAML copied to clipboard