tslearn
tslearn copied to clipboard
Fix deprecation warning for np array
/databricks/python/lib/python3.8/site-packages/tslearn/barycenters/softdtw.py:103: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray X_ = numpy.array([to_time_series(d, remove_nans=True) for d in X_])
Because this method often uses numpy arrays with different shapes, I think the approach is justified here.
Codecov Report
Merging #360 (500a2fb) into main (42a56cc) will increase coverage by
0.02%. The diff coverage is100.00%.
@@ Coverage Diff @@
## main #360 +/- ##
==========================================
+ Coverage 94.71% 94.74% +0.02%
==========================================
Files 59 59
Lines 4525 4525
==========================================
+ Hits 4286 4287 +1
+ Misses 239 238 -1
| Impacted Files | Coverage Δ | |
|---|---|---|
| tslearn/barycenters/softdtw.py | 100.00% <100.00%> (ø) |
|
| tslearn/clustering/kshape.py | 98.29% <0.00%> (+0.85%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 42a56cc...500a2fb. Read the comment docs.
Hi @brk21
Thanks for pointing out this deprecation warning! It seems to me that it is not even necessary to cast the list into a numpy array, or is it?
Couldn't we do something like:
X_ = [to_time_series(Xi, remove_nans=True) for Xi in X_]
?
Hi @rtavenar,
I think you are right, but I also think we will generate the same warning in the to_time_series function here:
https://github.com/tslearn-team/tslearn/blob/main/tslearn/utils/utils.py#L146
So you may end up adding the object=true parameter there to avoid the deprecation.
Please let me know if I can support in any way.
Ross
I would like to avoid the object=True since our algorithms in tslearn are supposed to operate on datasets represented as 3d arrays, which is no longer the case when arrays of objects are used.
Regarding the to_time_series function, it is definitely unexpected behaviour to provide a series that does not have the same number of features for all timestamps, so maybe we should throw a ValueError in this case, what do you think?
@rtavenar I would be strongly opposed to that because one of the main values of softdtw is that it can operate on time series of different lengths! This ValueError would effectively eliminate our ability to make use of softdtw from tslearn.
If you wanted to port to_time_series into a separate version for time series of varying lengths and just use object=true in the contexts or algorithms that are designed to operate on time series of different lengths, then that might be a good middle ground that preserves integrity for the to_time_series function but makes allowance for the algorithms that do operate on time series of different lengths.
@rtavenar I would be strongly opposed to that because one of the main values of
softdtwis that it can operate on time series of different lengths! ThisValueErrorwould effectively eliminate our ability to make use ofsoftdtwfromtslearn.
I meant raising a ValueError for the case of the to_time_series function only, since this function only deals with one time series at a time, not a dataset of time series (for which to_time_series_dataset exists). Your workaround or the one I suggested earlier are fine for softDTW.
Phew @rtavenar Apologies for the misunderstanding.
Yes, I would be fine with that.