FLAML
FLAML copied to clipboard
Unexpected result from flaml.default.LGBMClassifier on iris
I'm trying to benchmark the zero-shot flaml.default.LGBMClassifier
and I have seen some unexpected results. I'm working on Flaml 2.1.1.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml.default import LGBMClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=.5)
lgbm = LGBMClassifier().fit(X_train, y_train)
lgbm.score(X_test, y_test)
produces a test score of 0.3, which is chance. Using the standard 75/25 split, I get an accuracy of .92, which is around the expected value. Using a random forest with scikit-learn defaults, I get .92 both for the 50/50 split in the example as well as for the 75/25 split. I assume there's an issue where a parameter configuration is chosen that doesn't allow growing a tree at all.
From what I observed, the small dataset doesn't match quite close to existing datapoints, which failed to apply a nice hyperparameters combination to LGBM. I'll add this data as a datapoint to the lgbm default configs, to let KNN match a nice hyperparameters combination to this data. I'll raise a PR soon.
Thank you! I have a benchmark containing a lot of small datasets, I'd love to run it again once you added the point.