tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

[WIP] Fix numpy versions problems

Open YannCabanes opened this issue 3 years ago • 25 comments

YannCabanes avatar Jul 01 '22 22:07 YannCabanes

Hello, there seems to be a problem related to numpy versions into tslearn's main branch. For now this branch as no difference with tslearn's main branch. I have no error message when I run the tests on my local computer using pytest.

Part of the error is related to the lines:

import numpy as np cimport numpy as np np.import_array()

in the file: https://github.com/tslearn-team/tslearn/blob/main/tslearn/metrics/soft_dtw_fast.pyx

Here is the error message:

init.pxd:942: in numpy.import_array ??? E RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf

This error message looks to be related to different versions of numpy being installed: https://github.com/freqtrade/freqtrade/issues/4281

The solution of this error message looks to be to upgrade numpy: pip install numpy --upgrade

YannCabanes avatar Jul 04 '22 13:07 YannCabanes

At the beginning I was surprised by the import lines:

import numpy as np cimport numpy as np

but it seems to be correct: https://stackoverflow.com/questions/20268228/cython-cimport-and-import-numpy-as-both-np http://docs.cython.org/en/latest/src/tutorial/numpy.html#adding-types

YannCabanes avatar Jul 04 '22 13:07 YannCabanes

We should use numpy version <= 1.22 as I have the following error message: E ImportError: Numba needs NumPy 1.22 or less

YannCabanes avatar Jul 04 '22 18:07 YannCabanes

Now we have the following error message when running the tests on Linux with Python 3.7:

  • python -m pip install numpy==1.22 ERROR: Ignored the following versions that require a different python version: 1.22.0 Requires-Python >=3.8

Python 3.7 requires NumPy version <= 1.21.6

YannCabanes avatar Jul 04 '22 18:07 YannCabanes

Now we have the following error message:

tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax STUFF_cysax = "cysax" E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

which also seems to be related to numpy versions.

YannCabanes avatar Jul 04 '22 20:07 YannCabanes

Now there is only one test being run: docs/readthedocs.org:tslearn (successful) The other tests have not been performed.

YannCabanes avatar Jul 04 '22 20:07 YannCabanes

Hello @rtavenar and @GillesVandewiele, The tests are not failing on my local computer, so I am trying to find the problem (probably related to numpy versions) using directly the Continuous Intergration of tslearn. I am trying different things, I have not been very successful for now. Do you have any ideas about how to solve this? Any suggestions is welcome.

YannCabanes avatar Jul 06 '22 16:07 YannCabanes

Hi @YannCabanes

I have been doing a bit of unsuccessful digging into these issues myself. My 2 cents I can already give straight away (I will look further in depth into this later) is that these E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject errors are often related to pycocotools according to StackOverflow. But then again I would not know immediately which of our dependencies actually use pycocotools...

GillesVandewiele avatar Jul 06 '22 16:07 GillesVandewiele

Now, I have the following error message: E ImportError: Numba needs NumPy 1.22 or less numba 0.55.2 requires numpy<1.23,>=1.18, but you have numpy 1.23.0 which is incompatible.

I have previously tried to write: python -m pip install numpy==1.22 but then I have the error message for Python 3.7: Numpy 1.22 needs Python >= 3.8

YannCabanes avatar Jul 06 '22 19:07 YannCabanes

I am not sure...this is based on what I read in a scipy PR: https://github.com/scipy/scipy/pull/14813

tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax

So, would you mind trying this?

# in tslearn/metrics/cysax.pyx

import numpy
cimport numpy
numpy.import_array() # PLEASE ADD THIS RIGHT AFTER `cimport numpy`

Can you also add the same thing for cycc.pyx?


btw, soft_dtw_fast.pyx is good already and has this line.


And, then see if this issue can be resolved or if we get a new error or not.

NimaSarajpoor avatar Jul 31 '22 01:07 NimaSarajpoor

Thank you @NimaSarajpoor! Yes, I will try!

YannCabanes avatar Aug 01 '22 04:08 YannCabanes

Hello @NimaSarajpoor, I have tried to add numpy.import_array() after cimport numpy as you suggested. However, it seems that we still have the same kind of error message. Any suggestion is welcome, so please tell me if there is anything else that you would like to try.

YannCabanes avatar Aug 01 '22 05:08 YannCabanes

@rtavenar suggested me to try to replace the use of cython by a combination of numpy and numba, so I will try this method.

YannCabanes avatar Aug 02 '22 02:08 YannCabanes

A link to test the usefulness of the option "parallel=True" of the @njit decorator of numba: https://numba.readthedocs.io/en/stable/user/parallel.html#numba-parallel-diagnostics

YannCabanes avatar Aug 05 '22 04:08 YannCabanes

@YannCabanes Do you mind if I go through the changed files and review them, and share with you my thoughts? I think it would help me improve my "review" skill in my own work. (btw, please feel free to review mine. I can learn something!)

NimaSarajpoor avatar Aug 05 '22 05:08 NimaSarajpoor

Codecov Report

Base: 94.60% // Head: 94.59% // Decreases project coverage by -0.00% :warning:

Coverage data is based on head (37e54a8) compared to base (3596109). Patch coverage: 94.93% of modified lines in pull request are covered.

:exclamation: Current head 37e54a8 differs from pull request most recent head 1975b0f. Consider uploading reports for the commit 1975b0f to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #411      +/-   ##
==========================================
- Coverage   94.60%   94.59%   -0.01%     
==========================================
  Files          59       62       +3     
  Lines        4538     4739     +201     
==========================================
+ Hits         4293     4483     +190     
- Misses        245      256      +11     
Impacted Files Coverage Δ
tslearn/metrics/cycc.py 81.53% <81.53%> (ø)
tslearn/metrics/cysax.py 100.00% <100.00%> (ø)
tslearn/metrics/soft_dtw_fast.py 100.00% <100.00%> (ø)
tslearn/metrics/softdtw_variants.py 97.84% <100.00%> (ø)
tslearn/clustering/kshape.py 98.29% <0.00%> (+0.85%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Aug 05 '22 05:08 codecov-commenter

Hello @NimaSarajpoor, Yes, you can review my codes. Any help is welcome. After discussion with @rtavenar, we will try to replace the cython file by python files in which we will use numba. Now I am trying to understand in which context the different options (nopython, parallel...) of the @jit decorator of numba can be used. Then, I will run some tests to compare the execution time of cython and numba. Maybe that you can advise me for this part too.

YannCabanes avatar Aug 05 '22 15:08 YannCabanes

Yes, you can review my codes. Any help is welcome. Cool

we will try to replace the cython file by python files in which we will use numba. Yeah...if we can get the same performance, it would be great.

Now I am trying to understand in which context the different options (nopython, parallel...) of the @jit decorator of numba can be used. Then, I will run some tests to compare the execution time of cython and numba. Maybe that you can advise me for this part too.

I have some experience with numba but not that much. If I notice something, I will let you know for sure.

NimaSarajpoor avatar Aug 05 '22 16:08 NimaSarajpoor

Hello @NimaSarajpoor, Thank you for your review! Well, I do not have all answers to your questions for now as I only translated the cython files:

  • soft_dtw_fast.pyx
  • cysax.pyx
  • cycc.pyx into python files and I am not the original author of these cython files. I will try to answer as many questions as I can.

YannCabanes avatar Aug 05 '22 18:08 YannCabanes

Hello @NimaSarajpoor, I have added docstrings to the files soft_dtw_fast.py, cysax.py and cycc.py. If you have any suggestions to solve the code errors, please do not hesitate to send me your ideas.

YannCabanes avatar Aug 18 '22 15:08 YannCabanes

@YannCabanes Great effort! Sure.

NimaSarajpoor avatar Aug 19 '22 02:08 NimaSarajpoor

At this point, all the tests are passing. I will now try to add the jit decorators (currently commented). Then I will compare the execution speeds of the codes with the current numba version and the previous cython version for the files cycc.py, cysax.py and soft_dtw_fast.py.

YannCabanes avatar Aug 22 '22 20:08 YannCabanes

Adding the jit decorator creates errors for Python 3.7.

YannCabanes avatar Aug 22 '22 22:08 YannCabanes

During last continuous integration tests, only Linux Python37 failed (AssertionError: Arrays are not equal).

YannCabanes avatar Sep 05 '22 20:09 YannCabanes

Here are the execution times of the functions previously coded in Cython. The input values of the functions are simulated using numpy.random.randn The results are presented with the following hierarchical structure:

  1. Size of the input dataset (small then large)
  2. Name python file
  3. Name of the function tested in the python file
  4. Modes (Python, Numba py_func, Numba and Cython)

Small time series The execution time is divided by the number of repetitions to obtain the average time of a single execution. N_REPETITIONS = 100 N_TS = 15 (number of time series) SZ = 14 (size of the time series) D = 13 (dimension of the time series)

Functions of file cycc.py

TEST_NORMALIZED_CC Function type Execution Time Python 6.966590881347656e-05 Numba py_func 6.471872329711914e-05 Numba 6.30354881286621e-05 Cython 7.087945938110352e-05

TEST_CDIST_NORMALIZED_CC Function type Execution Time Python 0.004678127765655518 Numba py_func 0.006372392177581787 Numba 0.02055845022201538 Cython 0.004590597152709961

TEST_Y_SHIFTED_SBD_VEC Function type Execution Time Python 0.0007832765579223633 Numba py_func 0.001002175807952881 Numba 0.0073595881462097164 Cython 0.000695946216583252

Functions of file cysax.py

TEST_INV_TRANSFORM_PAA Function type Execution Time Python 0.0001658177375793457 Numba py_func 0.00012455224990844726 Numba 2.3276805877685546e-05 Cython 0.00072235107421875

TEST_CYDIST_SAX Function type Execution Time Python 0.0006324005126953125 Numba py_func 0.0006050252914428711 Numba 5.793571472167969e-06 Cython 0.0003102374076843262

TEST_INV_TRANSFORM_SAX Function type Execution Time Python 0.0012830519676208497 Numba py_func 0.001281435489654541 Numba 3.258943557739258e-05 Cython 0.0007562804222106934

TEST_CYSLOPES Function type Execution Time Python 0.03299321174621582 Numba py_func 0.03369884967803955 Numba 0.04607600450515747 Cython 0.03369905710220337

TEST_CYDIST_1D_SAX Function type Execution Time Python 0.0009629201889038086 Numba py_func 0.0009295821189880371 Numba 5.824565887451172e-06 Cython 0.0003400826454162598

TEST_INV_TRANSFORM_1D_SAX Function type Execution Time Python 0.010548267364501953 Numba py_func 0.009688065052032471 Numba 1.7206668853759766e-05 Cython 0.004239740371704101

Functions of file soft_dtw_fast.py

TEST_SOFTMIN3 Function type Execution Time Python 3.7169456481933593e-06 Numba py_func 3.7360191345214844e-06 Numba 5.7220458984375e-07

TEST_SOFT_DTW Function type Execution Time Python 0.0008032011985778808 Numba py_func 0.00016197919845581054 Numba 1.3413429260253906e-05 Cython 6.75201416015625e-06

TEST_SOFT_DTW_GRAD Function type Execution Time Python 0.000658884048461914 Numba py_func 0.0006336688995361329 Numba 4.945039749145508e-05 Cython 6.663799285888672e-06

TEST_JACOBIAN_PRODUCT_SQ_EUC Function type Execution Time Python 0.0015450143814086915 Numba py_func 0.0015096139907836913 Numba 6.406307220458985e-06 Cython 4.334449768066406e-06

Large time series The execution time is divided by the number of repetitions to obtain the average time of a single execution. N_REPETITIONS = 10 N_TS = 150 SZ = 140 D = 130

Functions of file cycc.py

TEST_NORMALIZED_CC Function type Execution Time Python 0.002593326568603516 Numba py_func 0.0023230791091918947 Numba 0.001892685890197754 Cython 0.0039809226989746095

TEST_CDIST_NORMALIZED_CC Function type Execution Time Python 14.68750901222229 Numba py_func 24.232205820083617 Numba 7.441932797431946 Cython 15.007141089439392

TEST_Y_SHIFTED_SBD_VEC Function type Execution Time Python 0.24766547679901124 Numba py_func 0.38209333419799807 Numba 0.11369473934173584 Cython 0.24488503932952882

Functions of file cysax.py

TEST_INV_TRANSFORM_PAA Function type Execution Time Python 0.07514638900756836 Numba py_func 0.07673845291137696 Numba 0.043023204803466795 Cython 0.8301484823226929

TEST_CYDIST_SAX Function type Execution Time Python 0.07396893501281739 Numba py_func 0.07371225357055664 Numba 3.3354759216308595e-05 Cython 0.029455232620239257

TEST_INV_TRANSFORM_SAX Function type Execution Time Python 1.3193160057067872 Numba py_func 1.3029633522033692 Numba 0.04689924716949463 Cython 0.7892702579498291

TEST_CYSLOPES Function type Execution Time Python 3.478916120529175 Numba py_func 3.4544804096221924 Numba 4.405482006072998 Cython 3.4761438608169555

TEST_CYDIST_1D_SAX Function type Execution Time Python 0.09309067726135253 Numba py_func 0.0933596134185791 Numba 4.3773651123046874e-05 Cython 0.03114762306213379

TEST_INV_TRANSFORM_1D_SAX Function type Execution Time Python 9.720394968986511 Numba py_func 9.639565801620483 Numba 0.024070429801940917 Cython 4.553128099441528

Functions of file soft_dtw_fast.py

TEST_SOFTMIN3 Function type Execution Time Python 4.100799560546875e-06 Numba py_func 3.838539123535157e-06 Numba 6.67572021484375e-07

TEST_SOFT_DTW Function type Execution Time Python 0.07861130237579346 Numba py_func 0.016379952430725098 Numba 0.0007523536682128906 Cython 0.0007349014282226563

TEST_SOFT_DTW_GRAD Function type Execution Time Python 0.07179102897644044 Numba py_func 0.06839404106140137 Numba 0.0008146047592163086 Cython 0.000745081901550293

TEST_JACOBIAN_PRODUCT_SQ_EUC Function type Execution Time Python 1.5018571853637694 Numba py_func 1.496809482574463 Numba 0.0005124330520629883 Cython 0.0023012399673461915

YannCabanes avatar Sep 16 '22 07:09 YannCabanes

I have removed the signatures (input and output types) in jit decorators in the last commit. It does not affect the execution time and will give more flexibility to the jit decorated functions with respect to the input types. I have written the expected input and output types in the documentation of each function decorated with @jit.

Information about signatures in jit decorators can be found at: https://numba.readthedocs.io/en/stable/reference/types.html#numba-types or at: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html It is said in the latest that the jit decorator has several modes of operation:

If one or more signatures are given in signature, a specialization is compiled for each of them. Calling the decorated function will then try to choose the best matching signature, and raise a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) if no appropriate conversion is available for the function arguments. If converting succeeds, the compiled machine code is executed with the converted arguments and the return value is converted back according to the signature.

If no signature is given, the decorated function implements lazy compilation. Each call to the decorated function will try to re-use an existing specialization if it exists (for example, a call with two integer arguments may re-use a specialization for argument types (numba.int64, numba.int64)). If no suitable specialization exists, a new specialization is compiled on-the-fly, stored for later use, and executed with the converted arguments.

YannCabanes avatar Sep 23 '22 21:09 YannCabanes

The last continuous integration test gave the following error message:

=================================== FAILURES =================================== ___________ test_all_estimators[LearningShapelets-LearningShapelets] ___________

name = 'LearningShapelets' Estimator = <class 'tslearn.shapelets.shapelets.LearningShapelets'>

@pytest.mark.parametrize('name, Estimator', get_estimators('all'))
def test_all_estimators(name, Estimator):
    """Test all the estimators in tslearn."""
    allow_nan = (hasattr(checks, 'ALLOW_NAN') and
                 Estimator().get_tags()["allow_nan"])
    if allow_nan:
        checks.ALLOW_NAN.append(name)
    if name in ["GlobalAlignmentKernelKMeans", "ShapeletModel",
                "SerializableShapeletModel"]:
        # Deprecated models
        return
  check_estimator(Estimator)

tslearn/tests/test_estimators.py:215:


tslearn/tests/test_estimators.py:197: in check_estimator check(estimator) /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:311: in wrapper return fn(*args, **kwargs) tslearn/tests/sklearn_patches.py:558: in check_pipeline_consistency assert_allclose_dense_sparse(result, result_pipe)


x = array([[3.7043095e-03], [6.7453969e-01], [6.3824987e-01], [1.2295246e-03], [2.0980835e-05]...4e-03], [8.6247969e-01], [1.4195442e-03], [5.0067902e-06], [9.4977307e-01]], dtype=float32) y = array([[0.40121353], [0.06187719], [0.05123574], [0.21641088], [0.2602595 ], [0.076... [0.25475943], [0.12683961], [0.27159142], [0.29283226], [0.16161257]], dtype=float32) rtol = 1e-07, atol = 1e-09, err_msg = ''

def assert_allclose_dense_sparse(x, y, rtol=1e-07, atol=1e-9, err_msg=""):
    """Assert allclose for sparse and dense data.

    Both x and y need to be either sparse or dense, they
    can't be mixed.

    Parameters
    ----------
    x : {array-like, sparse matrix}
        First array to compare.

    y : {array-like, sparse matrix}
        Second array to compare.

    rtol : float, default=1e-07
        relative tolerance; see numpy.allclose.

    atol : float, default=1e-9
        absolute tolerance; see numpy.allclose. Note that the default here is
        more tolerant than the default for numpy.testing.assert_allclose, where
        atol=0.

    err_msg : str, default=''
        Error message to raise.
    """
    if sp.sparse.issparse(x) and sp.sparse.issparse(y):
        x = x.tocsr()
        y = y.tocsr()
        x.sum_duplicates()
        y.sum_duplicates()
        assert_array_equal(x.indices, y.indices, err_msg=err_msg)
        assert_array_equal(x.indptr, y.indptr, err_msg=err_msg)
        assert_allclose(x.data, y.data, rtol=rtol, atol=atol, err_msg=err_msg)
    elif not sp.sparse.issparse(x) and not sp.sparse.issparse(y):
        # both dense
      assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)

E AssertionError: E Not equal to tolerance rtol=1e-07, atol=1e-09 E
E Mismatched elements: 30 / 30 (100%) E Max absolute difference: 0.7881605 E Max relative difference: 23.541649 E x: array([[3.704309e-03], E [6.745397e-01], E [6.382499e-01],... E y: array([[0.401214], E [0.061877], E [0.051236],...

/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:418: AssertionError

YannCabanes avatar Sep 26 '22 21:09 YannCabanes

There is still the same error message.

YannCabanes avatar Sep 28 '22 15:09 YannCabanes

The tests are failing with Linux, they pass with MacOS and Windows. The tests pass on my local computer, I am using Linux and Python 3.8.

YannCabanes avatar Sep 28 '22 20:09 YannCabanes

There is still the same failing tests with the signatures in the jit decorators. I will remove the signatures once again for more flexibility.

YannCabanes avatar Sep 28 '22 20:09 YannCabanes