tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Sporadic failure of check_decision_proba_consistency test on windows

Open GillesVandewiele opened this issue 6 years ago • 2 comments

It seems that the decision_function and predict_proba of TimeSeriesSVC do not produce perfect rank correlation all the time (see log below).

One possible fix would be to patch the make_blobs (and any other data generating function of sklearn) with make_timeseries_blobs from tslearn. Additionally, this will probably allow to remove some of the patches in sklearn_patches, as these often do nothing more than just replacing the data generating part by custom code.

LOG:

name = 'TimeSeriesSVC', Estimator = <class 'tslearn.svm.TimeSeriesSVC'>

    @pytest.mark.parametrize('name, Estimator', get_estimators('all'))
    def test_all_estimators(name, Estimator):
        """Test all the estimators in tslearn."""
        allow_nan = (hasattr(checks, 'ALLOW_NAN') and
                     _safe_tags(Estimator(), "allow_nan"))
        if allow_nan:
            checks.ALLOW_NAN.append(name)
>       check_estimator(Estimator)

tslearn\tests\test_estimators.py:191: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tslearn\tests\test_estimators.py:177: in check_estimator
    check(name, estimator)
C:\hostedtoolcache\windows\Python\3.7.6\x64\lib\site-packages\sklearn\utils\_testing.py:327: in wrapper
    return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

name = 'TimeSeriesSVC'
estimator_orig = TimeSeriesSVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
              decision_function_shape='ovr', degree...ak', max_iter=10, n_jobs=None, probability=True,
              random_state=None, shrinking=True, tol=0.001, verbose=0)

    @ignore_warnings(category=FutureWarning)
    def check_decision_proba_consistency(name, estimator_orig):
        # Check whether an estimator having both decision_function and
        # predict_proba methods has outputs with perfect rank correlation.
    
        centers = [(2, 2), (4, 4)]
        X, y = make_blobs(n_samples=100, random_state=0, n_features=4,
                          centers=centers, cluster_std=1.0, shuffle=True)
        X_test = np.random.randn(20, 2) + 4
        estimator = clone(estimator_orig)
    
        if (hasattr(estimator, "decision_function") and
                hasattr(estimator, "predict_proba")):
    
            estimator.fit(X, y)
            # Since the link function from decision_function() to predict_proba()
            # is sometimes not precise enough (typically expit), we round to the
            # 10th decimal to avoid numerical issues.
            a = estimator.predict_proba(X_test)[:, 1].round(decimals=10)
            b = estimator.decision_function(X_test).round(decimals=10)
>           assert_array_equal(rankdata(a), rankdata(b))
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 2 / 20 (10%)
E           Max absolute difference: 0.5
E           Max relative difference: 0.16666667
E            x: array([ 3.5, 17. ,  1. , 10. ,  3.5, 14. ,  2. ,  6. , 20. , 19. ,  9. ,
E                  13. , 11. ,  5. ,  8. , 16. ,  7. , 15. , 12. , 18. ])
E            y: array([ 3., 17.,  1., 10.,  4., 14.,  2.,  6., 20., 19.,  9., 13., 11.,
E                   5.,  8., 16.,  7., 15., 12., 18.])

C:\hostedtoolcache\windows\Python\3.7.6\x64\lib\site-packages\sklearn\utils\estimator_checks.py:2732: AssertionError

GillesVandewiele avatar Mar 26 '20 17:03 GillesVandewiele

This behaviour was discovered during PR #201

rtavenar avatar Mar 26 '20 17:03 rtavenar

Something I dont get wrt this bug is that default value for probability is False so tests for this class should raise an error when accessing predict_proba

Or maybe there is a trick in sklearn tests to set this parameter to True at some point, I dont know.

rtavenar avatar Jun 20 '20 08:06 rtavenar