tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Usage of to_hdf5 gives error

Open tdg2088 opened this issue 4 years ago • 4 comments

Describe the bug when I Use The Code Below,I got an error called:"TypeError: must be str, not int" at hdftools.py line 88, in _dicts_to_group h5file[path + key] = item To Reproduce x_dataset = to_time_series_dataset(result_aray) y_dataset=np.array(total_y) n_ts, ts_sz = x_dataset.shape[:2] n_classes=len(set(y_dataset)) shapelet_sizes = grabocka_params_to_shapelet_size_dict(n_ts=n_ts,ts_sz=ts_sz,n_classes=n_classes,l=0.1,r=1) shp_clf = LearningShapelets(n_shapelets_per_size=shapelet_sizes,optimizer=tf.optimizers.Adam(.01),batch_size=16,weight_regularizer=.01,max_iter=200,random_state=42,verbose=0) shp_clf.fit(x_dataset, y_dataset) shp_clf.to_hdf5("timeservice.hdf5") Expected behavior any one who have an idea about this problom? Environment (please complete the following information):

  • OS:Windows 10
  • tslearn version [0.5.0.5] Additional context when I try to run the example plot_shapelets.py ,and then call the to_hdf5 method ,I got the same error.

tdg2088 avatar Feb 08 '21 16:02 tdg2088

I can also reproduce a bug by appending shp_clf.to_hdf5("/some/path/test_to.hdf5") to tslearn/docs/examples/classification/plot_shapelets.py. However, I get this exception on Python 3.8.5:

Traceback (most recent call last):
  File "[...] test.py", line 85, in <module>
    shp_clf.to_hdf5("test_to.hdf5")
  File "[...] lib/python3.8/site-packages/tslearn/bases/bases.py", line 183, in to_hdf5
    hdftools.save_dict(d, path, 'data')
  File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 45, in save_dict
    _dicts_to_group(h5file, "{}/".format(group), d,
  File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 91, in _dicts_to_group
    _dicts_to_group(
  File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 91, in _dicts_to_group
    _dicts_to_group(
  File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 88, in _dicts_to_group
    h5file[path + key] = item
TypeError: can only concatenate str (not "int") to str

felixdivo avatar Feb 27 '21 14:02 felixdivo

Apparently, tslearn.hdftools has some problems converting non-string keys as well as lists and enums as values. The particular error above can be fixed by converting non-string keys to strings (and probably with similarly heuristics for the other data types) like below. But the real issue here is how to convert it back in hdftools.load_dict.

Overall, it feels like unnecessarily re-implementing the pickle module (except for the cross-platform-ness of HDF5). However, using pickle in this instance does not work either, since shp_clf._to_dict(output='hdf5') contains some lambdas. It would probably be reasonably fast with the pickle protocol version 5.

# appended this to tslearn/docs/examples/classification/plot_shapelets.py

from tslearn import hdftools
from enum import Enum

path = "test_to.hdf5"
d = shp_clf._to_dict(output='hdf5')

def convert_non_string_keys_recursively(d):
    if isinstance(d, dict):
        return {str(k): convert_non_string_keys_recursively(v) for k, v in d.items()}
    else:
        return d

hdftools.save_dict(convert_non_string_keys_recursively(d), path, 'data')

felixdivo avatar Feb 27 '21 15:02 felixdivo

Hi @felixdivo

Shouldn't this be implemented in hdftools directly? Could you try to do it?

Could we have an ugly fix like we append the type at the beginning / end of the string, like that: "list__[1,2,3]"?

rtavenar avatar Aug 17 '21 07:08 rtavenar

Hey @rtavenar, I'm afraid that the lambdas will never be serializable and deserializable. Why not just use pickle? It's an internal format anyway (as in different versions of tslearn as well as other software is/can be incompatible).

felixdivo avatar Oct 09 '21 17:10 felixdivo