sktime icon indicating copy to clipboard operation
sktime copied to clipboard

[BUG] Test failure in CI workflow with ProximityForest due to Assertion Error and incompatible sktime format

Open julian-fong opened this issue 8 months ago • 0 comments

Opening a separate issue as requested by @fkiraly to track the on going mismatch array issue with ProximityForest and the incompatible sktime format error.

To Reproduce To reproduce run test_fit_idempotent on ProximityTree to generate the assertion error.

To reproduce the incompatible sktime format error, run test_multioutput on ProximityForest

3.10, 3.11 or 3.12

Expected behavior The error below should appear the AssertionError

Arrays are not almost equal to 6 decimals

Mismatched elements: 1 / 5 (20%)
Max absolute difference: 2
Max relative difference: inf
 x: array([0, 0, 0, 0, 2])
 y: array([0, 0, 0, 0, 0])
= 1 failed, 7427 passed, 14508 skipped, 4 xfailed, 3 xpassed, 6188 warnings in 1609.82s (0:26:49) =
make: *** [Makefile:45: test_without_datasets] Error 1

To generate the incompatible sktime format error, the following error should appear

FAILED sktime/classification/tests/test_all_classifiers.py::TestAllClassifiers::test_multioutput[ProximityForest-1] - TypeError: X must be in an sktime compatible format. Allowed scitypes for classifiers are Panel mtypes, for instance a pandas.DataFrame with MultiIndex and last(-1) level an sktime compatible time index. Allowed compatible mtype format specifications are: ['nested_univ', 'numpy3D', 'numpyflat', 'pd-multiindex', 'pd-wide', 'pd-long', 'df-list', 'dask_panel'] . See the data format tutorial examples/AA_datatypes_and_datasets.ipynb. If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want. Error message for checked mtypes, in format [mtype: message], as follows: [df-list: obj must be list of pd.DataFrame, found <class 'pandas.core.frame.DataFrame'>]  [numpy3D: obj must be a numpy.ndarray, found <class 'pandas.core.frame.DataFrame'>]  [pd-multiindex: obj must have a MultiIndex, found <class 'pandas.core.indexes.base.Index'>]  [nested_univ: The instance index of obj must be unique, but found duplicates. Use obj.duplicated() to find the duplicates.]

Additional context The function _assert_array_almost_equal triggers this error when running the test test_fit_indempotent on ProximityForest or Proximity Tree. issues can be found via pr #6263 and other errors experience during #6590

julian-fong avatar Jun 17 '24 23:06 julian-fong