tslearn
tslearn copied to clipboard
[WIP] Fix numpy versions problems
Hello, there seems to be a problem related to numpy versions into tslearn's main branch. For now this branch as no difference with tslearn's main branch. I have no error message when I run the tests on my local computer using pytest.
Part of the error is related to the lines:
import numpy as np cimport numpy as np np.import_array()
in the file: https://github.com/tslearn-team/tslearn/blob/main/tslearn/metrics/soft_dtw_fast.pyx
Here is the error message:
init.pxd:942: in numpy.import_array ??? E RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
This error message looks to be related to different versions of numpy being installed: https://github.com/freqtrade/freqtrade/issues/4281
The solution of this error message looks to be to upgrade numpy: pip install numpy --upgrade
At the beginning I was surprised by the import lines:
import numpy as np cimport numpy as np
but it seems to be correct: https://stackoverflow.com/questions/20268228/cython-cimport-and-import-numpy-as-both-np http://docs.cython.org/en/latest/src/tutorial/numpy.html#adding-types
We should use numpy version <= 1.22 as I have the following error message: E ImportError: Numba needs NumPy 1.22 or less
Now we have the following error message when running the tests on Linux with Python 3.7:
- python -m pip install numpy==1.22 ERROR: Ignored the following versions that require a different python version: 1.22.0 Requires-Python >=3.8
Python 3.7 requires NumPy version <= 1.21.6
Now we have the following error message:
tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax STUFF_cysax = "cysax" E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
which also seems to be related to numpy versions.
Now there is only one test being run: docs/readthedocs.org:tslearn (successful) The other tests have not been performed.
Hello @rtavenar and @GillesVandewiele, The tests are not failing on my local computer, so I am trying to find the problem (probably related to numpy versions) using directly the Continuous Intergration of tslearn. I am trying different things, I have not been very successful for now. Do you have any ideas about how to solve this? Any suggestions is welcome.
Hi @YannCabanes
I have been doing a bit of unsuccessful digging into these issues myself. My 2 cents I can already give straight away (I will look further in depth into this later) is that these E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject errors are often related to pycocotools according to StackOverflow. But then again I would not know immediately which of our dependencies actually use pycocotools...
Now, I have the following error message: E ImportError: Numba needs NumPy 1.22 or less numba 0.55.2 requires numpy<1.23,>=1.18, but you have numpy 1.23.0 which is incompatible.
I have previously tried to write: python -m pip install numpy==1.22 but then I have the error message for Python 3.7: Numpy 1.22 needs Python >= 3.8
I am not sure...this is based on what I read in a scipy PR: https://github.com/scipy/scipy/pull/14813
tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax
So, would you mind trying this?
# in tslearn/metrics/cysax.pyx
import numpy
cimport numpy
numpy.import_array() # PLEASE ADD THIS RIGHT AFTER `cimport numpy`
Can you also add the same thing for cycc.pyx?
btw, soft_dtw_fast.pyx is good already and has this line.
And, then see if this issue can be resolved or if we get a new error or not.
Thank you @NimaSarajpoor! Yes, I will try!
Hello @NimaSarajpoor, I have tried to add numpy.import_array() after cimport numpy as you suggested. However, it seems that we still have the same kind of error message. Any suggestion is welcome, so please tell me if there is anything else that you would like to try.
@rtavenar suggested me to try to replace the use of cython by a combination of numpy and numba, so I will try this method.
A link to test the usefulness of the option "parallel=True" of the @njit decorator of numba: https://numba.readthedocs.io/en/stable/user/parallel.html#numba-parallel-diagnostics
@YannCabanes Do you mind if I go through the changed files and review them, and share with you my thoughts? I think it would help me improve my "review" skill in my own work. (btw, please feel free to review mine. I can learn something!)
Codecov Report
Base: 94.60% // Head: 94.59% // Decreases project coverage by -0.00% :warning:
Coverage data is based on head (
37e54a8) compared to base (3596109). Patch coverage: 94.93% of modified lines in pull request are covered.
:exclamation: Current head 37e54a8 differs from pull request most recent head 1975b0f. Consider uploading reports for the commit 1975b0f to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #411 +/- ##
==========================================
- Coverage 94.60% 94.59% -0.01%
==========================================
Files 59 62 +3
Lines 4538 4739 +201
==========================================
+ Hits 4293 4483 +190
- Misses 245 256 +11
| Impacted Files | Coverage Δ | |
|---|---|---|
| tslearn/metrics/cycc.py | 81.53% <81.53%> (ø) |
|
| tslearn/metrics/cysax.py | 100.00% <100.00%> (ø) |
|
| tslearn/metrics/soft_dtw_fast.py | 100.00% <100.00%> (ø) |
|
| tslearn/metrics/softdtw_variants.py | 97.84% <100.00%> (ø) |
|
| tslearn/clustering/kshape.py | 98.29% <0.00%> (+0.85%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Hello @NimaSarajpoor, Yes, you can review my codes. Any help is welcome. After discussion with @rtavenar, we will try to replace the cython file by python files in which we will use numba. Now I am trying to understand in which context the different options (nopython, parallel...) of the @jit decorator of numba can be used. Then, I will run some tests to compare the execution time of cython and numba. Maybe that you can advise me for this part too.
Yes, you can review my codes. Any help is welcome. Cool
we will try to replace the cython file by python files in which we will use numba. Yeah...if we can get the same performance, it would be great.
Now I am trying to understand in which context the different options (nopython, parallel...) of the @jit decorator of numba can be used. Then, I will run some tests to compare the execution time of cython and numba. Maybe that you can advise me for this part too.
I have some experience with numba but not that much. If I notice something, I will let you know for sure.
Hello @NimaSarajpoor, Thank you for your review! Well, I do not have all answers to your questions for now as I only translated the cython files:
- soft_dtw_fast.pyx
- cysax.pyx
- cycc.pyx into python files and I am not the original author of these cython files. I will try to answer as many questions as I can.
Hello @NimaSarajpoor, I have added docstrings to the files soft_dtw_fast.py, cysax.py and cycc.py. If you have any suggestions to solve the code errors, please do not hesitate to send me your ideas.
@YannCabanes Great effort! Sure.
At this point, all the tests are passing. I will now try to add the jit decorators (currently commented). Then I will compare the execution speeds of the codes with the current numba version and the previous cython version for the files cycc.py, cysax.py and soft_dtw_fast.py.
Adding the jit decorator creates errors for Python 3.7.
During last continuous integration tests, only Linux Python37 failed (AssertionError: Arrays are not equal).
Here are the execution times of the functions previously coded in Cython. The input values of the functions are simulated using numpy.random.randn The results are presented with the following hierarchical structure:
- Size of the input dataset (small then large)
- Name python file
- Name of the function tested in the python file
- Modes (Python, Numba py_func, Numba and Cython)
Small time series The execution time is divided by the number of repetitions to obtain the average time of a single execution. N_REPETITIONS = 100 N_TS = 15 (number of time series) SZ = 14 (size of the time series) D = 13 (dimension of the time series)
Functions of file cycc.py
TEST_NORMALIZED_CC Function type Execution Time Python 6.966590881347656e-05 Numba py_func 6.471872329711914e-05 Numba 6.30354881286621e-05 Cython 7.087945938110352e-05
TEST_CDIST_NORMALIZED_CC Function type Execution Time Python 0.004678127765655518 Numba py_func 0.006372392177581787 Numba 0.02055845022201538 Cython 0.004590597152709961
TEST_Y_SHIFTED_SBD_VEC Function type Execution Time Python 0.0007832765579223633 Numba py_func 0.001002175807952881 Numba 0.0073595881462097164 Cython 0.000695946216583252
Functions of file cysax.py
TEST_INV_TRANSFORM_PAA Function type Execution Time Python 0.0001658177375793457 Numba py_func 0.00012455224990844726 Numba 2.3276805877685546e-05 Cython 0.00072235107421875
TEST_CYDIST_SAX Function type Execution Time Python 0.0006324005126953125 Numba py_func 0.0006050252914428711 Numba 5.793571472167969e-06 Cython 0.0003102374076843262
TEST_INV_TRANSFORM_SAX Function type Execution Time Python 0.0012830519676208497 Numba py_func 0.001281435489654541 Numba 3.258943557739258e-05 Cython 0.0007562804222106934
TEST_CYSLOPES Function type Execution Time Python 0.03299321174621582 Numba py_func 0.03369884967803955 Numba 0.04607600450515747 Cython 0.03369905710220337
TEST_CYDIST_1D_SAX Function type Execution Time Python 0.0009629201889038086 Numba py_func 0.0009295821189880371 Numba 5.824565887451172e-06 Cython 0.0003400826454162598
TEST_INV_TRANSFORM_1D_SAX Function type Execution Time Python 0.010548267364501953 Numba py_func 0.009688065052032471 Numba 1.7206668853759766e-05 Cython 0.004239740371704101
Functions of file soft_dtw_fast.py
TEST_SOFTMIN3 Function type Execution Time Python 3.7169456481933593e-06 Numba py_func 3.7360191345214844e-06 Numba 5.7220458984375e-07
TEST_SOFT_DTW Function type Execution Time Python 0.0008032011985778808 Numba py_func 0.00016197919845581054 Numba 1.3413429260253906e-05 Cython 6.75201416015625e-06
TEST_SOFT_DTW_GRAD Function type Execution Time Python 0.000658884048461914 Numba py_func 0.0006336688995361329 Numba 4.945039749145508e-05 Cython 6.663799285888672e-06
TEST_JACOBIAN_PRODUCT_SQ_EUC Function type Execution Time Python 0.0015450143814086915 Numba py_func 0.0015096139907836913 Numba 6.406307220458985e-06 Cython 4.334449768066406e-06
Large time series The execution time is divided by the number of repetitions to obtain the average time of a single execution. N_REPETITIONS = 10 N_TS = 150 SZ = 140 D = 130
Functions of file cycc.py
TEST_NORMALIZED_CC Function type Execution Time Python 0.002593326568603516 Numba py_func 0.0023230791091918947 Numba 0.001892685890197754 Cython 0.0039809226989746095
TEST_CDIST_NORMALIZED_CC Function type Execution Time Python 14.68750901222229 Numba py_func 24.232205820083617 Numba 7.441932797431946 Cython 15.007141089439392
TEST_Y_SHIFTED_SBD_VEC Function type Execution Time Python 0.24766547679901124 Numba py_func 0.38209333419799807 Numba 0.11369473934173584 Cython 0.24488503932952882
Functions of file cysax.py
TEST_INV_TRANSFORM_PAA Function type Execution Time Python 0.07514638900756836 Numba py_func 0.07673845291137696 Numba 0.043023204803466795 Cython 0.8301484823226929
TEST_CYDIST_SAX Function type Execution Time Python 0.07396893501281739 Numba py_func 0.07371225357055664 Numba 3.3354759216308595e-05 Cython 0.029455232620239257
TEST_INV_TRANSFORM_SAX Function type Execution Time Python 1.3193160057067872 Numba py_func 1.3029633522033692 Numba 0.04689924716949463 Cython 0.7892702579498291
TEST_CYSLOPES Function type Execution Time Python 3.478916120529175 Numba py_func 3.4544804096221924 Numba 4.405482006072998 Cython 3.4761438608169555
TEST_CYDIST_1D_SAX Function type Execution Time Python 0.09309067726135253 Numba py_func 0.0933596134185791 Numba 4.3773651123046874e-05 Cython 0.03114762306213379
TEST_INV_TRANSFORM_1D_SAX Function type Execution Time Python 9.720394968986511 Numba py_func 9.639565801620483 Numba 0.024070429801940917 Cython 4.553128099441528
Functions of file soft_dtw_fast.py
TEST_SOFTMIN3 Function type Execution Time Python 4.100799560546875e-06 Numba py_func 3.838539123535157e-06 Numba 6.67572021484375e-07
TEST_SOFT_DTW Function type Execution Time Python 0.07861130237579346 Numba py_func 0.016379952430725098 Numba 0.0007523536682128906 Cython 0.0007349014282226563
TEST_SOFT_DTW_GRAD Function type Execution Time Python 0.07179102897644044 Numba py_func 0.06839404106140137 Numba 0.0008146047592163086 Cython 0.000745081901550293
TEST_JACOBIAN_PRODUCT_SQ_EUC Function type Execution Time Python 1.5018571853637694 Numba py_func 1.496809482574463 Numba 0.0005124330520629883 Cython 0.0023012399673461915
I have removed the signatures (input and output types) in jit decorators in the last commit. It does not affect the execution time and will give more flexibility to the jit decorated functions with respect to the input types. I have written the expected input and output types in the documentation of each function decorated with @jit.
Information about signatures in jit decorators can be found at: https://numba.readthedocs.io/en/stable/reference/types.html#numba-types or at: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html It is said in the latest that the jit decorator has several modes of operation:
If one or more signatures are given in signature, a specialization is compiled for each of them. Calling the decorated function will then try to choose the best matching signature, and raise a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) if no appropriate conversion is available for the function arguments. If converting succeeds, the compiled machine code is executed with the converted arguments and the return value is converted back according to the signature.
If no signature is given, the decorated function implements lazy compilation. Each call to the decorated function will try to re-use an existing specialization if it exists (for example, a call with two integer arguments may re-use a specialization for argument types (numba.int64, numba.int64)). If no suitable specialization exists, a new specialization is compiled on-the-fly, stored for later use, and executed with the converted arguments.
The last continuous integration test gave the following error message:
=================================== FAILURES =================================== ___________ test_all_estimators[LearningShapelets-LearningShapelets] ___________
name = 'LearningShapelets' Estimator = <class 'tslearn.shapelets.shapelets.LearningShapelets'>
@pytest.mark.parametrize('name, Estimator', get_estimators('all'))
def test_all_estimators(name, Estimator):
"""Test all the estimators in tslearn."""
allow_nan = (hasattr(checks, 'ALLOW_NAN') and
Estimator().get_tags()["allow_nan"])
if allow_nan:
checks.ALLOW_NAN.append(name)
if name in ["GlobalAlignmentKernelKMeans", "ShapeletModel",
"SerializableShapeletModel"]:
# Deprecated models
return
check_estimator(Estimator)
tslearn/tests/test_estimators.py:215:
tslearn/tests/test_estimators.py:197: in check_estimator check(estimator) /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:311: in wrapper return fn(*args, **kwargs) tslearn/tests/sklearn_patches.py:558: in check_pipeline_consistency assert_allclose_dense_sparse(result, result_pipe)
x = array([[3.7043095e-03], [6.7453969e-01], [6.3824987e-01], [1.2295246e-03], [2.0980835e-05]...4e-03], [8.6247969e-01], [1.4195442e-03], [5.0067902e-06], [9.4977307e-01]], dtype=float32) y = array([[0.40121353], [0.06187719], [0.05123574], [0.21641088], [0.2602595 ], [0.076... [0.25475943], [0.12683961], [0.27159142], [0.29283226], [0.16161257]], dtype=float32) rtol = 1e-07, atol = 1e-09, err_msg = ''
def assert_allclose_dense_sparse(x, y, rtol=1e-07, atol=1e-9, err_msg=""):
"""Assert allclose for sparse and dense data.
Both x and y need to be either sparse or dense, they
can't be mixed.
Parameters
----------
x : {array-like, sparse matrix}
First array to compare.
y : {array-like, sparse matrix}
Second array to compare.
rtol : float, default=1e-07
relative tolerance; see numpy.allclose.
atol : float, default=1e-9
absolute tolerance; see numpy.allclose. Note that the default here is
more tolerant than the default for numpy.testing.assert_allclose, where
atol=0.
err_msg : str, default=''
Error message to raise.
"""
if sp.sparse.issparse(x) and sp.sparse.issparse(y):
x = x.tocsr()
y = y.tocsr()
x.sum_duplicates()
y.sum_duplicates()
assert_array_equal(x.indices, y.indices, err_msg=err_msg)
assert_array_equal(x.indptr, y.indptr, err_msg=err_msg)
assert_allclose(x.data, y.data, rtol=rtol, atol=atol, err_msg=err_msg)
elif not sp.sparse.issparse(x) and not sp.sparse.issparse(y):
# both dense
assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=1e-09
E
E Mismatched elements: 30 / 30 (100%)
E Max absolute difference: 0.7881605
E Max relative difference: 23.541649
E x: array([[3.704309e-03],
E [6.745397e-01],
E [6.382499e-01],...
E y: array([[0.401214],
E [0.061877],
E [0.051236],...
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:418: AssertionError
There is still the same error message.
The tests are failing with Linux, they pass with MacOS and Windows. The tests pass on my local computer, I am using Linux and Python 3.8.
There is still the same failing tests with the signatures in the jit decorators. I will remove the signatures once again for more flexibility.