auto-sklearn Failing configurations

Failing configurations

Open eddiebergman opened this issue 3 years ago • 0 comments

EDIT: After some investigation, this appears to have less to do with the configurations and more with imputation. After trying to recreate the failures, it seems that X arrays that reach a treshold of Nans end up causing the configurations to fail. These Nan's are added randomly and so it explains the infrequency of it.
Edit2: "fast_ica" with "fun":"exp" fails if there is a majority of Nans in the data.
Edit3: "fast_ica" with "fast_ica:whiten" : "False" fails with NaN's in the input.
Edit4: "fast_ica" with "whiten" : "False" fails even with no NaN values present.
Edit:5 "fast_ica" with "iris" dataset works, even with high occurence of Nan's, it seems that it is more dependant on the frequency of 0's in the dataset rather than Nan's.
Edit6: Trying to force a certain "feature:preprocessor:__choice__" is currently not possible. Trying to manually go in and edit the Config is not straight forward and should be approached when ConfigSpace.Configuration get's updated to allow for easier modificaiton of a Config. See this issue ConfigSpace #205 for why it's not straight forward to delete a key and add a new one.

We leave some randomness in the configurations that get tested when testing different classifier and regressor components, these are collected here:

Python version 3.8
Test
test/test_pipeline/test_classification.py::SimpleClassificationPipelineTest::test_configurations_sparse

Configuration:
  balancing:strategy, Value: 'weighting'
  classifier:__choice__, Value: 'sgd'
  classifier:sgd:alpha, Value: 7.27693595714389e-05
  classifier:sgd:average, Value: 'False'
  classifier:sgd:eta0, Value: 0.013654826040547558
  classifier:sgd:fit_intercept, Constant: 'True'
  classifier:sgd:learning_rate, Value: 'invscaling'
  classifier:sgd:loss, Value: 'log'
  classifier:sgd:penalty, Value: 'l1'
  classifier:sgd:power_t, Value: 0.5468767593727824
  classifier:sgd:tol, Value: 8.162675288740052e-05
  data_preprocessor:__choice__, Value: 'feature_type'
  data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
  data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
  data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 467
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
  feature_preprocessor:__choice__, Value: 'kernel_pca'
  feature_preprocessor:kernel_pca:gamma, Value: 6.985386846337043
  feature_preprocessor:kernel_pca:kernel, Value: 'rbf'
  feature_preprocessor:kernel_pca:n_components, Value: 10

Python version 3.8
Test
test/test_pipeline/test_classification.py::SimpleClassificationPipelineTest::test_configurations_signed_data

Configuration:
  balancing:strategy, Value: 'weighting'
  classifier:__choice__, Value: 'lda'
  classifier:lda:shrinkage, Value: 'auto'
  classifier:lda:tol, Value: 0.038890093430048595
  data_preprocessor:__choice__, Value: 'feature_type'
  data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
  data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'minority_coalescer'
  data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Value: 0.001521146558163954
  data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'mean'
  data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'none'
  feature_preprocessor:__choice__, Value: 'fast_ica'
  feature_preprocessor:fast_ica:algorithm, Value: 'deflation'
  feature_preprocessor:fast_ica:fun, Value: 'exp'
  feature_preprocessor:fast_ica:whiten, Value: 'False'

Python version 3.10
Test
SimpleClassificationPipelineTest.test_configurations_sparse

 Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'qda'
  classifier:qda:reg_param, Value: 0.7722372097734942
  data_preprocessor:__choice__, Value: 'feature_type'
  data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
  data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
  data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'median'
  data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 1761
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
  feature_preprocessor:__choice__, Value: 'kernel_pca'
  feature_preprocessor:kernel_pca:gamma, Value: 2.351280410584469
  feature_preprocessor:kernel_pca:kernel, Value: 'rbf'
  feature_preprocessor:kernel_pca:n_components, Value: 10

Python version 3.8
Test
SimpleClassificationPipelineTest.test_configurations_signed_data

Configuration:
  balancing:strategy, Value: 'none'
  classifier:__choice__, Value: 'gaussian_nb'
  data_preprocessor:__choice__, Value: 'feature_type'
  data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Value: 'encoding'
  data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Value: 'no_coalescense'
  data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Value: 'most_frequent'
  data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Value: 'quantile_transformer'
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Value: 1004
  data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Value: 'normal'
  feature_preprocessor:__choice__, Value: 'fast_ica'
  feature_preprocessor:fast_ica:algorithm, Value: 'deflation'
  feature_preprocessor:fast_ica:fun, Value: 'exp'
  feature_preprocessor:fast_ica:whiten, Value: 'False'

Dec 08 '21 07:12 eddiebergman

auto-sklearn auto-sklearn copied to clipboard

Failing configurations

auto-sklearn
auto-sklearn copied to clipboard