[DOC] a little detail in documentation can be of great help for me to understand how to check the results of the classifier
Describe the issue linked to the documentation
I am working on a problem to classify a multivariate time series, it is almost like identifying a specific moment in the time series and consider that moment as class1 and rest all as class 0. I have taken help of this superb library to train an rdst classifier to solve my problem partially. I have used version 0.11.1 which permitted use of different distance measures with the classifier. After training when i try to check dtw distance and euclidean distance values for the identified shapelets and a random time series window from validation set, there were some problems i am facing
when the shapelets sizes were smaller than the largest shapelet size, it use inf to fill the values in the small shapelets and distance always returned inf
then I shortened the considered length of time series window to consider only the length of actual shapelet size , ignore the indexes of shapelets with inf values, and then calculated the distance measures
this is my major problem right now, where I am getting minimum distance between shapelet and time series window , comparatively the plot on x actually shows a better plot for different shapelet and the same example time series window in contrast to the shapelet and the same example having lowest distance.
the shapelets of one class visualization method gives the error below when trying to plot shapelets
ValueError Traceback (most recent call last) /tmp/ipython-input-3510015363.py in <cell line: 0>() 5 id_class = 1 # Class 1 for event data 6 # Visualize the top 10 most important shapelets for class 1 ----> 7 fig = stc_vis.visualize_shapelets_one_class( 8 X_Val, 9 y_val,
3 frames /usr/local/lib/python3.11/dist-packages/aeon/visualisation/estimator/_shapelets.py in plot_distance_vector(self, X, ax, show_legend, show_threshold, line_options, threshold_options, figure_options, rc_Params_options, matplotlib_style) 357 X_means, X_stds = sliding_mean_std_one_series(X, self.length, self.dilation) 358 X_subs = normalize_subsequences(X_subs, X_means, X_stds) --> 359 _values = (self.values - self.values.mean(axis=-1)) / self.values.std( 360 axis=1 361 )
ValueError: operands could not be broadcast together with shapes (4,9) (4,)
Will be very grateful for your help
Suggest a potential alternative/fix
No response
Linked to https://github.com/aeon-toolkit/aeon/discussions/3027
Shapelets are considered useful for one primary reasons, that is interpretability.
Please consider the following code: (i) rdstclf.get_fitted_params()["_transformer__shapelets"][0] rdstclf is the trained rdst classifer. There was a variable _transformer__shapelets that gave a lot of information about the shapelets extracted, their sizes etc.
But the documentation did not mention exactly what these pieces of information were. If it is mentioned then it will be useful for the user to understand the details about the fitted parameters.
(ii)In the newer version _transforme__shapelets is not accessible, instead some information is available via rdstclf.transformer.shapelets[1] but still unable to understand exactly what is the information given by the variable
Is there any way to identify which shapelets discovered , represent which class or other information like shapelet occurrence or argmin of each shapelet can be read from somewhere
Thanks for the comments ! i'll look into improving the visualisation and docs for these methods in the following weeks when time allows.
The informations of the shapelets array from the transformation should be stated in the docs of the transformer :
Attributes
----------
shapelets : list
The stored shapelets. Each item in the list is an array containing:
- shapelet values
- startpoint values
- length parameter
- dilation parameter
- threshold parameter
- normalization parameter
- mean parameter
- standard deviation parameter
- class value
From there you can get the class the shapelet was extracted from, its startpoint ( timestamp it was sampled from as start :start+length if dilation is 1), etc.
Note that the value array is of size of the largest length. To get the value of à shapelet you should do :
shp_id = 0
values[shp_id, :, :lengths[shp_id]]
Otherwise you will get inf values.
I notice that the sample id is not stored, which seem weird.
Thank you!
Can you please help me understand the purpose of the threshold parameter in the above list, is it SO threshold or distance distance ? Actually for a misclassified instance of the non desirable class I calculated the distance value with each identified/shapelet, now I want to know how to use the threshold parameter to decide which class will be selected for that sample.
Hi,
Yes, the threshold is indeed used to compute the shapelet occurence (SO) feature, see this function
I've been and will be very busy until end of october so I haven't got the chance at working on the issue, but I keep it in mind.
Thank you
Hello @baraline Hi, I would like to work on this issue.
I can help by:
- Expanding the documentation for the shapelet transformer and RDST classifier, including clearly describing all fields in the shapelets array (values, startpoint, length, dilation, threshold, normalization, mean/std, class).
- Adding small examples to show how to correctly extract and interpret shapelets.
- Clarifying the usage of the threshold parameter and shapelet occurrence (SO).
- Reviewing the visualization functions for the broadcasting issue and updating the docs accordingly.
Let me know if this direction looks good, and I’ll start working on it.