scikit-learn ⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy

CI is still failing on Linux_Nightly.pylatest_pip_scipy_dev (Jun 26, 2023)

test_parallel_train
test_dict_learning_lassocd_readonly_data
test_iforest_parallel_regression[61]
test_ridge_regression[long-61-True-sparse_cg]
test_ridge_regression[long-61-False-sparse_cg]
test_ridge_regression[wide-61-True-sparse_cg]
test_ridge_regression[wide-61-False-sparse_cg]
test_ridge_regression_hstacked_X[long-61-True-sparse_cg]
test_ridge_regression_hstacked_X[long-61-False-sparse_cg]
test_ridge_regression_hstacked_X[wide-61-True-sparse_cg]
test_model_pipeline_same_dense_and_sparse[Ridge-params6]
test_ridge_regression_hstacked_X[wide-61-False-sparse_cg]
test_ridge_regression_vstacked_X[long-61-True-sparse_cg]
test_ridge_regression_vstacked_X[long-61-False-sparse_cg]
test_ridge_regression_vstacked_X[wide-61-True-sparse_cg]
test_ridge_regression_vstacked_X[wide-61-False-sparse_cg]
test_ridge_regression_unpenalized[long-61-True-sparse_cg]
test_ridge_regression_unpenalized[long-61-False-sparse_cg]
test_ridge_regression_unpenalized[wide-61-True-sparse_cg]
test_ridge_regression_unpenalized[wide-61-False-sparse_cg]
test_ridge_regression_unpenalized_hstacked_X[long-61-True-sparse_cg]
test_ridge_regression_unpenalized_hstacked_X[long-61-False-sparse_cg]
test_ridge_regression_unpenalized_hstacked_X[wide-61-True-sparse_cg]
test_ridge_regression_unpenalized_hstacked_X[wide-61-False-sparse_cg]
test_ridge_regression_unpenalized_vstacked_X[long-61-True-sparse_cg]
test_ridge_regression_unpenalized_vstacked_X[long-61-False-sparse_cg]
test_ridge_regression_unpenalized_vstacked_X[wide-61-True-sparse_cg]
test_ridge_regression_unpenalized_vstacked_X[wide-61-False-sparse_cg]
test_ridge_regression_sample_weights[long-61-1.0-True-True-sparse_cg]
test_ridge_regression_sample_weights[long-61-1.0-True-True-sag]
test_ridge_regression_sample_weights[long-61-1.0-True-False-sparse_cg]
test_ridge_regression_sample_weights[long-61-1.0-False-True-sparse_cg]
test_ridge_regression_sample_weights[long-61-1.0-False-False-sparse_cg]
test_ridge_regression_sample_weights[long-61-0.01-True-True-sparse_cg]
test_ridge_regression_sample_weights[long-61-0.01-True-True-sag]
test_ridge_regression_sample_weights[long-61-0.01-True-False-sparse_cg]
test_ridge_regression_sample_weights[long-61-0.01-False-True-sparse_cg]
test_ridge_regression_sample_weights[long-61-0.01-False-False-sparse_cg]
test_ridge_regression_sample_weights[wide-61-1.0-True-True-sparse_cg]
test_ridge_regression_sample_weights[wide-61-1.0-True-True-sag]
test_ridge_regression_sample_weights[wide-61-1.0-True-False-sparse_cg]
test_ridge_regression_sample_weights[wide-61-1.0-False-True-sparse_cg]
test_ridge_regression_sample_weights[wide-61-1.0-False-False-sparse_cg]
test_ridge_regression_sample_weights[wide-61-0.01-True-True-sparse_cg]
test_ridge_regression_sample_weights[wide-61-0.01-True-True-sag]
test_ridge_regression_sample_weights[wide-61-0.01-True-False-sparse_cg]
test_ridge_regression_sample_weights[wide-61-0.01-False-True-sparse_cg]
test_ridge_regression_sample_weights[wide-61-0.01-False-False-sparse_cg]
test_ridge_individual_penalties
test_solver_consistency[seed0-20-float32-0.1-sparse_cg-False]
test_solver_consistency[seed0-20-float32-0.1-sparse_cg-True]
test_solver_consistency[seed0-40-float32-1.0-sparse_cg-False]
test_solver_consistency[seed0-40-float32-1.0-sparse_cg-True]
test_solver_consistency[seed0-20-float64-0.2-sparse_cg-False]
test_solver_consistency[seed0-20-float64-0.2-sparse_cg-True]
test_solver_consistency[seed1-20-float32-0.1-sparse_cg-False]
test_solver_consistency[seed1-20-float32-0.1-sparse_cg-True]
test_solver_consistency[seed1-40-float32-1.0-sparse_cg-False]
test_solver_consistency[seed1-40-float32-1.0-sparse_cg-True]
test_solver_consistency[seed1-20-float64-0.2-sparse_cg-False]
test_solver_consistency[seed1-20-float64-0.2-sparse_cg-True]
test_solver_consistency[seed2-20-float32-0.1-sparse_cg-False]
test_solver_consistency[seed2-20-float32-0.1-sparse_cg-True]
test_solver_consistency[seed2-40-float32-1.0-sparse_cg-False]
test_solver_consistency[seed2-40-float32-1.0-sparse_cg-True]
test_solver_consistency[seed2-20-float64-0.2-sparse_cg-False]
test_solver_consistency[seed2-20-float64-0.2-sparse_cg-True]
test_cross_validate[True]
test_cross_val_predict
test_cross_val_predict_input_types
test_ridge_classifier_with_scoring[DENSE_FILTER-cv1-None]
test_ridge_classifier_with_scoring[DENSE_FILTER-cv1-accuracy]
test_ridge_classifier_with_scoring[DENSE_FILTER-cv1-_accuracy_callable]
test_ridge_classifier_with_scoring[SPARSE_FILTER-cv1-None]
test_ridge_classifier_with_scoring[SPARSE_FILTER-cv1-accuracy]
test_ridge_classifier_with_scoring[SPARSE_FILTER-cv1-_accuracy_callable]
test_ridge_regression_custom_scoring[DENSE_FILTER-cv1]
test_ridge_regression_custom_scoring[SPARSE_FILTER-cv1]
test_dense_sparse[_test_ridge_cv]
test_dense_sparse[_test_ridge_diabetes]
test_dense_sparse[_test_multi_ridge_diabetes]
test_dense_sparse[_test_ridge_classifiers]
test_dense_sparse[_test_tolerance]
test_sparse_design_with_sample_weights
test_sparse_cg_max_iter
test_ridge_fit_intercept_sparse[61-True-sparse_cg]
test_ridge_fit_intercept_sparse[61-True-auto]
test_ridge_fit_intercept_sparse[61-False-sparse_cg]
test_ridge_fit_intercept_sparse[61-False-auto]
test_ridge_fit_intercept_sparse_sag[61-True]
test_ridge_fit_intercept_sparse_sag[61-False]
test_ridge_regression_check_arguments_validity[auto-csr_matrix-None-False]
test_ridge_regression_check_arguments_validity[auto-csr_matrix-sample_weight1-False]
test_ridge_regression_check_arguments_validity[sparse_cg-array-None-False]
test_ridge_regression_check_arguments_validity[sparse_cg-array-sample_weight1-False]
test_ridge_regression_check_arguments_validity[sparse_cg-csr_matrix-None-False]
test_ridge_regression_check_arguments_validity[sparse_cg-csr_matrix-sample_weight1-False]
test_dtype_match[sparse_cg]
test_ridge_regression_dtype_stability[0-sparse_cg]
test_ridge_sample_weight_consistency[61-sparse_cg-tall-False-False]
test_ridge_sample_weight_consistency[61-sparse_cg-tall-False-True]
test_ridge_sample_weight_consistency[61-sparse_cg-tall-True-False]
test_ridge_sample_weight_consistency[61-sparse_cg-tall-True-True]
test_ridge_sample_weight_consistency[61-sparse_cg-wide-False-False]
test_ridge_sample_weight_consistency[61-sparse_cg-wide-False-True]
test_ridge_sample_weight_consistency[61-sparse_cg-wide-True-False]
test_ridge_sample_weight_consistency[61-sparse_cg-wide-True-True]
test_ridge_sample_weight_consistency[61-sag-tall-True-True]
test_ridge_sample_weight_consistency[61-sag-wide-True-True]
test_sag_regressor_computed_correctly
test_sag_regressor[0]
test_sag_regressor[1]
test_sag_regressor[2]
test_estimators[RegressorChain(base_estimator=Ridge())-check_estimator_sparse_data]
test_estimators[Ridge()-check_estimator_sparse_data]
test_estimators[RidgeClassifier()-check_estimator_sparse_data]
test_estimators[MultiOutputRegressor(estimator=Ridge())-check_estimator_sparse_data]
test_estimators[StackingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])-check_estimator_sparse_data]
test_estimators[VotingRegressor(estimators=[('est1',Ridge(alpha=0.1)),('est2',Ridge(alpha=1))])-check_estimator_sparse_data]
test_search_cv[HalvingGridSearchCV(cv=2,estimator=Ridge(),min_resources='smallest',param_grid={'alpha':[0.1,1.0]},random_state=0)-check_estimator_sparse_data0]
test_search_cv[HalvingGridSearchCV(cv=2,estimator=Ridge(),min_resources='smallest',param_grid={'alpha':[0.1,1.0]},random_state=0)-check_estimator_sparse_data1]
test_meta_estimators_delegate_data_validation[MultiOutputRegressor]
test_meta_estimators_delegate_data_validation[StackingRegressor]
test_meta_estimators_delegate_data_validation[TransformedTargetRegressor]
test_meta_estimators_delegate_data_validation[VotingRegressor]
test_base_chain_fit_and_predict_with_sparse_data_and_cv

Apr 12 '23 02:04 scikit-learn-bot

/take

Apr 12 '23 10:04 glemaitre

The culprit is pandas dev. I will bisect to know which commit changed the behaviour.

Apr 12 '23 10:04 glemaitre

So it comes from this commit: https://github.com/pandas-dev/pandas/pull/52542

It comes from calling pd.concat(..., ignore_index=True) with a first dataset containing None (thus an object dtype) with a second dataset containing np.nan and float (thus a float64 dtype).

The previous behaviour cast the column as object dtype while the new behaviour is casting into float64.

I am trying to craft a minimal reproducer.

Apr 12 '23 12:04 glemaitre

There are plenty of (~280) recent errors probably due to a pandas change (and maybe numpy too?), symptoms look like this:

FutureWarning: is_sparse is deprecated and will be removed in a future version. Check isinstance(dtype, pd.SparseDtype) instead.
FutureWarning: The behavior of DataFrame concatenation with all-NA entries is deprecated. In a future version, this will no longer exclude all-NA columns when determining the result dtypes. To retain the old behavior, cast the all-NA columns to the desired dtype before the concat operation.
ValueError: setting an array element with a sequence.
DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)

Number of failures look like this (quick and dirty analysis may miss a few kind of errors):

    245 FutureWarning: is_sparse is deprecated and will be removed in a future version. Check `isinstance(dtype, pd.SparseDtype)` instead.
     19 DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
     14 ValueError: setting an array element with a sequence.
      5 FutureWarning: The behavior of DataFrame concatenation with all-NA entries is deprecated. In a future version, this will no longer exclude all-NA columns when determining the result dtypes. To retain the old behavior, cast the all-NA columns to the desired dtype before the concat operation.

Apr 26 '23 14:04 lesteve

Opened #26287 about pandas is_sparse

Apr 27 '23 04:04 lesteve

Seems like some np.find_common_type DeprecationWarning are coming from pandas https://github.com/pandas-dev/pandas/issues/53236 and should hopefully be fixed soon.

May 23 '23 16:05 lesteve

CI is no longer failing! ✅

Successful run on Jun 19, 2023

Jun 16 '23 03:06 scikit-learn-bot

For you information, SciPy is currently transitioning from the sparse matrix semantic to the sparse array semantic (see https://github.com/scikit-learn/scikit-learn/issues/26418 for discussing what it means for scikit-learn).

If tests using sparse data fail on pylatest_pip_scipy_dev, feel free to ping me.

Jun 23 '23 15:06 jjerphan

These issues have all been fixed. Let's close

Jul 06 '23 11:07 jeremiedbb

scikit-learn
scikit-learn copied to clipboard

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️

CI is no longer failing! ✅

scikit-learn scikit-learn copied to clipboard

⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev ⚠️

CI is no longer failing! ✅

scikit-learn
scikit-learn copied to clipboard