aeon
aeon copied to clipboard
[BUG] ShapeletTransform: binary ig calculation problem
Describe the bug
The current _calc_binary_ig( ) evaluates split points between data points with the same feature values but different labels, which might not be suitable for datasets that contain a lot of such data points.
Steps/Code to reproduce the bug
from aeon.transformations.collection.shapelet_based._shapelet_transform import _calc_binary_ig orderline = [(2,-1),(2,-1),(2,1),(3,1),(3,1)] c1, c2 = 3, 2 _calc_binary_ig(orderline,c1,c2)
Expected results
0.42
Actual results
0.97
Versions
System: python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] executable: c:\xxx\python.exe machine: Windows-10-10.0.19041-SP0
Python dependencies: pip: 22.3.1 setuptools: 57.4.0 scikit-learn: 1.4.0 aeon: 0.7.1 statsmodels: None numpy: 1.24.0 scipy: 1.10.1 pandas: 2.0.3 matplotlib: 3.5.0 joblib: 1.3.2 numba: 0.58.1 pmdarima: None tsfresh: None
thanks for this, we will take a look next week
next week became next month sorry about that....
I dont think this really constitutes a bug really, its true to the algorithm.
I guess for the above you are recommending ignoring splits such as [(2,-1), (2,-1)], [(2,1),(3,1),(3,1)] so we would then evaluate (default split) [ ] [(2,-1), (2,-1),(2,1),(3,1),(3,1)] skip [(2,-1)] [(2,-1),(2,1),(3,1),(3,1)] split == 0 I think by the logic and [(2,-1),(2,-1)] ,[(2,1),(3,1),(3,1)] split == 1
then continue with [(2,-1), (2,-1),(2,1)] [(3,1),(3,1)] split == 2
I can enforce this
# evaluate each split point
for split in range(len(orderline)):
next_class = orderline[split][1] # +1 if this class, -1 if other
# Check here that the distance is different to the next one
if split == 0 and orderline[split][0] == orderline[split+1][0]:
continue
elif orderline[split][0] == orderline[split-1][0]:
continue
need to double check the logic a bit confusing about first item, but this gives me IG 0.770950 not of 0.42