tslearn
tslearn copied to clipboard
Usage of to_hdf5 gives error
Describe the bug when I Use The Code Below,I got an error called:"TypeError: must be str, not int" at hdftools.py line 88, in _dicts_to_group h5file[path + key] = item To Reproduce x_dataset = to_time_series_dataset(result_aray) y_dataset=np.array(total_y) n_ts, ts_sz = x_dataset.shape[:2] n_classes=len(set(y_dataset)) shapelet_sizes = grabocka_params_to_shapelet_size_dict(n_ts=n_ts,ts_sz=ts_sz,n_classes=n_classes,l=0.1,r=1) shp_clf = LearningShapelets(n_shapelets_per_size=shapelet_sizes,optimizer=tf.optimizers.Adam(.01),batch_size=16,weight_regularizer=.01,max_iter=200,random_state=42,verbose=0) shp_clf.fit(x_dataset, y_dataset) shp_clf.to_hdf5("timeservice.hdf5") Expected behavior any one who have an idea about this problom? Environment (please complete the following information):
- OS:Windows 10
- tslearn version [0.5.0.5] Additional context when I try to run the example plot_shapelets.py ,and then call the to_hdf5 method ,I got the same error.
I can also reproduce a bug by appending shp_clf.to_hdf5("/some/path/test_to.hdf5")
to tslearn/docs/examples/classification/plot_shapelets.py
. However, I get this exception on Python 3.8.5:
Traceback (most recent call last):
File "[...] test.py", line 85, in <module>
shp_clf.to_hdf5("test_to.hdf5")
File "[...] lib/python3.8/site-packages/tslearn/bases/bases.py", line 183, in to_hdf5
hdftools.save_dict(d, path, 'data')
File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 45, in save_dict
_dicts_to_group(h5file, "{}/".format(group), d,
File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 91, in _dicts_to_group
_dicts_to_group(
File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 91, in _dicts_to_group
_dicts_to_group(
File "[...] lib/python3.8/site-packages/tslearn/hdftools/hdftools.py", line 88, in _dicts_to_group
h5file[path + key] = item
TypeError: can only concatenate str (not "int") to str
Apparently, tslearn.hdftools
has some problems converting non-string keys as well as lists and enums as values. The particular error above can be fixed by converting non-string keys to strings (and probably with similarly heuristics for the other data types) like below. But the real issue here is how to convert it back in hdftools.load_dict
.
Overall, it feels like unnecessarily re-implementing the pickle module (except for the cross-platform-ness of HDF5). However, using pickle in this instance does not work either, since shp_clf._to_dict(output='hdf5')
contains some lambdas. It would probably be reasonably fast with the pickle protocol version 5.
# appended this to tslearn/docs/examples/classification/plot_shapelets.py
from tslearn import hdftools
from enum import Enum
path = "test_to.hdf5"
d = shp_clf._to_dict(output='hdf5')
def convert_non_string_keys_recursively(d):
if isinstance(d, dict):
return {str(k): convert_non_string_keys_recursively(v) for k, v in d.items()}
else:
return d
hdftools.save_dict(convert_non_string_keys_recursively(d), path, 'data')
Hi @felixdivo
Shouldn't this be implemented in hdftools directly? Could you try to do it?
Could we have an ugly fix like we append the type at the beginning / end of the string, like that: "list__[1,2,3]"
?
Hey @rtavenar, I'm afraid that the lambdas will never be serializable and deserializable. Why not just use pickle? It's an internal format anyway (as in different versions of tslearn as well as other software is/can be incompatible).