tpot
tpot copied to clipboard
TPOT Config, Terminals are required to have a unique name.
Hi,
I'm trying to use TPOT specifying classifiers and pre-processing techniques to evaluate in the config dictionary.
However, when I run the fit()
, I get the following error:
AssertionError: Terminals are required to have a unique name. Consider using the argument 'name' to rename your second PCA__svd_solver=l terminal.
This is the first time with TPOT...where I'm doing something wrong?
Context of the issue
This is the come I'm using:
MANUAL_features_indices = ";".join([str(i) for i in range(477)])
VGGISH_features_indices = ";".join([str(i) for i in range(477, 477+256)])
L3_features_indices = ";".join([str(i) for i in range(477+256, 477+256+1024)])
FEATURES_LIST = [MANUAL_features_indices, VGGISH_features_indices, L3_features_indices]
FEATURES_SUBSETS = []
for L in range(0, len(FEATURES_LIST)+1):
for subset in itertools.combinations(FEATURES_LIST, L):
if len(subset) != 0:
FEATURES_SUBSETS.append(list(subset))
tpot_config = {
"tpot.builtins.FeatureSetSelector": {
"subset_list": FEATURES_LIST,
"sel_subset": FEATURES_SUBSETS
},
"sklearn.preprocessing.StandardScaler": { },
"sklearn.preprocessing.Normalizer": {
"norm": ["l1", "l2", "max"]
},
"sklearn.decomposition.PCA": {
"n_components": [.7, .8, .9, .95, .99],
"svd_solver": "full"
},
"sklearn.svm.SVC": {
"C": [.01, .1, 1, 10, 100],
"gamma": [100, 10, 1, .1, .01, .001, "scale", "auto"],
"kernel": ["rbf", "poly", "sigmoid"],
"degree": [2, 3, 4, 5, 6],
"probability": [True],
"class_weight": ["balanced", None],
},
"sklearn.ensemble.RandomForestClassifier": {
"n_estimators": [10, 20, 50, 100, 200, 500, 1000],
"min_samples_split": [2, 6, 8, 10, 12, 20],
"max_depth": [10, 20, 30, 50, 100, 150, 200, None],
"criterion": ["entropy", "gini"],
"max_features": ["auto", "sqrt", "log2"],
"class_weight": ["balanced", None]
},
"sklearn.linear_model.LogisticRegression": {
"penalty": ["none", "l2"],
"solver": ["newton-cg", "sag", "saga", "lbfgs"],
"C": np.logspace(-3, 3, 100),
"max_iter": [300000]
},
"sklearn.ensemble.AdaBoostClassifier": {
"n_estimators": [10, 20, 50, 100, 200, 500, 1000],
"base_estimator": [SVC(probability = True), LogisticRegression(), None],
"learning_rate": [10, 5, 1, .5, .1, .05, .01, .001],
"algorithm": ["SAMME", "SAMME.R"]
},
"sklearn.neural_network.MLPClassifier": {
"activation": ["relu", "tanh", "logistic", "identity"],
"solver": ["lbfgs", "sgd", "adam"],
"alpha": [1e-6, 1e-5, 1e-4, 1e-3],
"batch_size": [16, 32, 64],
"shuffle": [True],
"learning_rate": ["constant", "invscaling", "adaptive"],
"max_iter": [10000],
"early_stopping": [True],
"random_state": [42],
"validation_fraction": [.1, .2]
}
}
pipeline_optimizer = TPOTClassifier(
random_state=23,
generations=5,
population_size=100,
scoring="roc_auc",
cv=RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1),
subsample=0.1,
n_jobs=-1,
verbosity=3,
periodic_checkpoint_folder="tpot_kcl.txt",
config_dict=tpot_config
)
pipeline_optimizer.fit(X_train, y_train)
As you can note, I have 3 main sets of features (manual, VGGISH, and L3) and I would like to test different combinations of them. Then, I would like to apply the PCA with a different number of components, and finally test 5 classifiers: SVM, Random Forest, AdaBoost, and MLP.
Just ran into this myself. The problem is one of your config options is repeated, creating a duplicate. Config options are iterated over, creating terminals for each option so the string "full" in PCA__svd_solver is iterated over and "l" is added twice. Putting "full" in a list should fix the problem.
"sklearn.decomposition.PCA": {
"n_components": [.7, .8, .9, .95, .99],
- "svd_solver": "full"
+ "svd_solver": ["full"]
},