Sporadic failure of check_decision_proba_consistency test on windows
It seems that the decision_function and predict_proba of TimeSeriesSVC do not produce perfect rank correlation all the time (see log below).
One possible fix would be to patch the make_blobs (and any other data generating function of sklearn) with make_timeseries_blobs from tslearn. Additionally, this will probably allow to remove some of the patches in sklearn_patches, as these often do nothing more than just replacing the data generating part by custom code.
LOG:
name = 'TimeSeriesSVC', Estimator = <class 'tslearn.svm.TimeSeriesSVC'>
@pytest.mark.parametrize('name, Estimator', get_estimators('all'))
def test_all_estimators(name, Estimator):
"""Test all the estimators in tslearn."""
allow_nan = (hasattr(checks, 'ALLOW_NAN') and
_safe_tags(Estimator(), "allow_nan"))
if allow_nan:
checks.ALLOW_NAN.append(name)
> check_estimator(Estimator)
tslearn\tests\test_estimators.py:191:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tslearn\tests\test_estimators.py:177: in check_estimator
check(name, estimator)
C:\hostedtoolcache\windows\Python\3.7.6\x64\lib\site-packages\sklearn\utils\_testing.py:327: in wrapper
return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'TimeSeriesSVC'
estimator_orig = TimeSeriesSVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree...ak', max_iter=10, n_jobs=None, probability=True,
random_state=None, shrinking=True, tol=0.001, verbose=0)
@ignore_warnings(category=FutureWarning)
def check_decision_proba_consistency(name, estimator_orig):
# Check whether an estimator having both decision_function and
# predict_proba methods has outputs with perfect rank correlation.
centers = [(2, 2), (4, 4)]
X, y = make_blobs(n_samples=100, random_state=0, n_features=4,
centers=centers, cluster_std=1.0, shuffle=True)
X_test = np.random.randn(20, 2) + 4
estimator = clone(estimator_orig)
if (hasattr(estimator, "decision_function") and
hasattr(estimator, "predict_proba")):
estimator.fit(X, y)
# Since the link function from decision_function() to predict_proba()
# is sometimes not precise enough (typically expit), we round to the
# 10th decimal to avoid numerical issues.
a = estimator.predict_proba(X_test)[:, 1].round(decimals=10)
b = estimator.decision_function(X_test).round(decimals=10)
> assert_array_equal(rankdata(a), rankdata(b))
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 2 / 20 (10%)
E Max absolute difference: 0.5
E Max relative difference: 0.16666667
E x: array([ 3.5, 17. , 1. , 10. , 3.5, 14. , 2. , 6. , 20. , 19. , 9. ,
E 13. , 11. , 5. , 8. , 16. , 7. , 15. , 12. , 18. ])
E y: array([ 3., 17., 1., 10., 4., 14., 2., 6., 20., 19., 9., 13., 11.,
E 5., 8., 16., 7., 15., 12., 18.])
C:\hostedtoolcache\windows\Python\3.7.6\x64\lib\site-packages\sklearn\utils\estimator_checks.py:2732: AssertionError
This behaviour was discovered during PR #201
Something I dont get wrt this bug is that default value for probability is False so tests for this class should raise an error when accessing predict_proba
Or maybe there is a trick in sklearn tests to set this parameter to True at some point, I dont know.