aeon icon indicating copy to clipboard operation
aeon copied to clipboard

[ENH] Implement Proximity Forest 2.0 classifier using aeon distances

Open itsdivya1309 opened this issue 1 year ago • 15 comments

Reference Issues/PRs

Closes: #428 Incorporates changes suggested in #1874 (maybe we can close PR #1876)

What does this implement/fix? Explain your changes.

  1. Created a private distance file to parameterise DTW and ADTW distances.
  2. Created a function to calculate the first_order_derivative of a time series.
  3. Implemented the ProximityTree2 and ProximityForest2 class, as per the paper.
  4. Added the classes to API.
  5. Wrote unit tests.

To do:

  • [ ] Improve the computational efficiency by implementing Early Abandoning and Pruning algorithm for elastic distance measures.

itsdivya1309 avatar Aug 15 '24 10:08 itsdivya1309

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ]. I have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • [ ] Run pre-commit checks for all files
  • [ ] Run all pytest tests and configurations
  • [ ] Run all notebook example tests
  • [ ] Run numba-disabled codecov tests
  • [ ] Stop automatic pre-commit fixes (always disabled for drafts)
  • [ ] Push an empty commit to re-run CI checks

aeon-actions-bot[bot] avatar Aug 15 '24 10:08 aeon-actions-bot[bot]

have we compared results vs published?

TonyBagnall avatar Sep 11 '24 11:09 TonyBagnall

Not yet, need to put it on the cluster. I have not got results for the original yet also.

MatthewMiddlehurst avatar Sep 16 '24 19:09 MatthewMiddlehurst

have we compared results vs published?

Actually, the algorithm isn't complete yet. We still need to work on computational power, particularly integrating the EAP technique. I'd resume the work in a couple of days.

itsdivya1309 avatar Sep 17 '24 13:09 itsdivya1309

Want to run this soon. This is not blocking that, but some component seems to be non-deterministic.

MatthewMiddlehurst avatar Nov 10 '24 22:11 MatthewMiddlehurst

What were the remaining things to do here ? In terms of testing, it seems that there is some kind of non-deterministic thing happens, which given the algorithm shouldn't be happening If I remember correctly.

baraline avatar Dec 04 '24 10:12 baraline

Need to compare to past results and yeah something is failing the non-deterministic test

MatthewMiddlehurst avatar Dec 04 '24 10:12 MatthewMiddlehurst

What were the remaining things to do here ? In terms of testing, it seems that there is some kind of non-deterministic thing happens, which given the algorithm shouldn't be happening If I remember correctly.

Hi, we need to integrate the EAP for distance measures to complete this algorithm as such. In the current implementation, I've used private distance functions, which use pruning to stop distance calculations greater than a threshold.

itsdivya1309 avatar Dec 04 '24 11:12 itsdivya1309

Hi, @MatthewMiddlehurst, I've corrected the code. I think we are good to compare the results now.

itsdivya1309 avatar Dec 09 '24 16:12 itsdivya1309

that was not the issue, its will be internal to the fit tree function. Would revert the last change as the previous was how we do that elsewhere.

Does not stop me evaluating, just have to find the time to get it working with our setup and run it 🙂

MatthewMiddlehurst avatar Dec 12 '24 19:12 MatthewMiddlehurst

Hi, I ran this on 90+ UCR datasets with 5 resamples, and it appears to perform worse.

5 resamples

aeon average acc: 0.827681 paper average acc: 0.861343 average acc diff: -0.03366

  aeon paper diff
SonyAIBORobotSurface1 0.941764 0.878869 0.062895
Wine 0.822222 0.781481 0.040741
Lightning7 0.775342 0.742466 0.032877
Lightning2 0.809836 0.783607 0.02623
ECG200 0.904 0.89 0.014
RefrigerationDevices 0.7008 0.693333 0.007467
DiatomSizeReduction 0.944444 0.937255 0.00719
HouseTwenty 0.941176 0.934454 0.006723
Ham 0.737143 0.733333 0.00381
DistalPhalanxOutlineCorrect 0.806522 0.803623 0.002899
FaceFour 0.956818 0.954545 0.002273
ItalyPowerDemand 0.957629 0.955879 0.001749
ECG5000 0.944267 0.943111 0.001156
Mallat 0.97177 0.971087 0.000682
Coffee 1 1 0
GunPointOldVersusYoung 1 1 0
InsectEPGRegularTrain 1 1 0
InsectEPGSmallTrain 1 1 0
SmoothSubspace 0.998667 0.998667 0
Trace 1 1 0
TwoPatterns 0.9997 0.99995 -0.00025
GunPointMaleVersusFemale 0.998101 0.999367 -0.00127
Wafer 0.99682 0.998799 -0.00198
SyntheticControl 0.992667 0.995333 -0.00267
Earthquakes 0.748201 0.751079 -0.00288
FaceAll 0.947456 0.951361 -0.00391
Plane 0.994286 1 -0.00571
FacesUCR 0.959024 0.964878 -0.00585
OliveOil 0.88 0.886667 -0.00667
DistalPhalanxOutlineAgeGroup 0.794245 0.801439 -0.00719
MixedShapesSmallTrain 0.92701 0.93468 -0.00767
MixedShapesRegularTrain 0.962392 0.971546 -0.00915
PhalangesOutlinesCorrect 0.812121 0.821678 -0.00956
Chinatown 0.956851 0.96793 -0.01108
MoteStrain 0.916294 0.927476 -0.01118
CBF 0.975778 0.988222 -0.01244
Meat 0.963333 0.976667 -0.01333
CricketZ 0.804103 0.818974 -0.01487
CinCECGTorso 0.973768 0.988696 -0.01493
InsectWingbeatSound 0.615253 0.630202 -0.01495
Worms 0.693506 0.709091 -0.01558
Crop 0.745595 0.762512 -0.01692
GunPointAgeSpan 0.978481 0.99557 -0.01709
CricketX 0.781538 0.798974 -0.01744
PowerCons 0.97 0.987778 -0.01778
ShapeletSim 0.89 0.907778 -0.01778
Strawberry 0.942162 0.963784 -0.02162
Computers 0.768 0.7904 -0.0224
BME 0.973333 1 -0.02667
Herring 0.56875 0.596875 -0.02813
ACSF1 0.712 0.742 -0.03
SwedishLeaf 0.93312 0.96384 -0.03072
ToeSegmentation2 0.887692 0.918462 -0.03077
Symbols 0.934271 0.96804 -0.03377
CricketY 0.767692 0.802051 -0.03436
Haptics 0.446104 0.480519 -0.03442
UWaveGestureLibraryY 0.75321 0.78794 -0.03473
UWaveGestureLibraryX 0.815131 0.851591 -0.03646
ArrowHead 0.861714 0.898286 -0.03657
MiddlePhalanxOutlineCorrect 0.798625 0.837113 -0.03849
UWaveGestureLibraryZ 0.750307 0.789336 -0.03903
MedicalImages 0.749211 0.791316 -0.04211
WordSynonyms 0.748903 0.79185 -0.04295
ECGFiveDays 0.926365 0.973519 -0.04715
Rock 0.764 0.812 -0.048
SonyAIBORobotSurface2 0.843861 0.89192 -0.04806
ShapesAll 0.85 0.898667 -0.04867
WormsTwoClass 0.737662 0.787013 -0.04935
BirdChicken 0.88 0.93 -0.05
GunPoint 0.949333 1 -0.05067
Yoga 0.8544 0.9054 -0.051
ProximalPhalanxTW 0.747317 0.8 -0.05268
FiftyWords 0.793846 0.847473 -0.05363
ProximalPhalanxOutlineAgeGroup 0.794146 0.847805 -0.05366
ProximalPhalanxOutlineCorrect 0.817182 0.876976 -0.05979
Fish 0.908571 0.969143 -0.06057
MiddlePhalanxTW 0.497403 0.558442 -0.06104
DistalPhalanxTW 0.647482 0.709353 -0.06187
Adiac 0.690537 0.75601 -0.06547
MiddlePhalanxOutlineAgeGroup 0.607792 0.675325 -0.06753
ChlorineConcentration 0.584635 0.652344 -0.06771
ScreenType 0.5696 0.6416 -0.072
OSULeaf 0.800826 0.88843 -0.0876
UMD 0.884722 0.975 -0.09028
SmallKitchenAppliances 0.685333 0.780267 -0.09493
BeetleFly 0.76 0.86 -0.1
Beef 0.52 0.62 -0.1
EOGHorizontalSignal 0.668508 0.768508 -0.1
InlineSkate 0.402182 0.505455 -0.10327
LargeKitchenAppliances 0.716267 0.819733 -0.10347
EOGVerticalSignal 0.656354 0.760221 -0.10387
FreezerRegularTrain 0.882456 0.999228 -0.11677
Car 0.763333 0.883333 -0.12
ToeSegmentation1 0.769298 0.89386 -0.12456
FreezerSmallTrain 0.754386 0.892912 -0.13853
TwoLeadECG 0.830378 0.997191 -0.16681
train/test

aeon average acc: 0.815098 paper average acc: 0.848603 average acc diff: -0.0335

  aeon paper diff
SonyAIBORobotSurface1 0.908486 0.816972 0.091514
Wine 0.574074 0.518519 0.055556
Ham 0.685714 0.657143 0.028571
ArrowHead 0.897143 0.874286 0.022857
DiatomSizeReduction 0.973856 0.954248 0.019608
HouseTwenty 0.94958 0.932773 0.016807
Lightning2 0.836066 0.819672 0.016393
RefrigerationDevices 0.557333 0.546667 0.010667
DistalPhalanxOutlineCorrect 0.789855 0.786232 0.003623
Coffee 1 1 0
DistalPhalanxOutlineAgeGroup 0.726619 0.726619 0
FaceFour 0.965909 0.965909 0
GunPointMaleVersusFemale 1 1 0
GunPointOldVersusYoung 1 1 0
Haptics 0.457792 0.457792 0
InsectEPGRegularTrain 1 1 0
InsectEPGSmallTrain 1 1 0
OliveOil 0.9 0.9 0
SmoothSubspace 1 1 0
SyntheticControl 0.99 0.99 0
Trace 1 1 0
FacesUCR 0.957073 0.957561 -0.00049
TwoPatterns 0.999 0.99975 -0.00075
CricketY 0.807692 0.810256 -0.00256
CricketZ 0.802564 0.805128 -0.00256
Wafer 0.99562 0.998378 -0.00276
Chinatown 0.973761 0.976676 -0.00292
ECG5000 0.941556 0.945778 -0.00422
ItalyPowerDemand 0.962099 0.96793 -0.00583
FaceAll 0.800592 0.808284 -0.00769
MixedShapesRegularTrain 0.960825 0.969485 -0.00866
CinCECGTorso 0.960145 0.968841 -0.0087
Plane 0.990476 1 -0.00952
MiddlePhalanxTW 0.519481 0.532468 -0.01299
GunPoint 0.986667 1 -0.01333
Lightning7 0.780822 0.794521 -0.0137
Earthquakes 0.748201 0.76259 -0.01439
MixedShapesSmallTrain 0.922887 0.938144 -0.01526
Meat 0.916667 0.933333 -0.01667
GunPointAgeSpan 0.981013 1 -0.01899
SwedishLeaf 0.9312 0.9504 -0.0192
BME 0.98 1 -0.02
CricketX 0.776923 0.797436 -0.02051
Crop 0.746786 0.767321 -0.02054
ProximalPhalanxOutlineCorrect 0.859107 0.879725 -0.02062
MoteStrain 0.904153 0.92492 -0.02077
DistalPhalanxTW 0.640288 0.661871 -0.02158
Mallat 0.955224 0.978252 -0.02303
ToeSegmentation2 0.884615 0.907692 -0.02308
MedicalImages 0.759211 0.782895 -0.02368
EOGHorizontalSignal 0.546961 0.571823 -0.02486
PowerCons 0.961111 0.988889 -0.02778
Symbols 0.948744 0.976884 -0.02814
UWaveGestureLibraryY 0.754886 0.784478 -0.02959
Strawberry 0.935135 0.964865 -0.02973
Herring 0.59375 0.625 -0.03125
Computers 0.712 0.744 -0.032
InsectWingbeatSound 0.611111 0.643434 -0.03232
Beef 0.7 0.733333 -0.03333
UMD 0.951389 0.986111 -0.03472
PhalangesOutlinesCorrect 0.794872 0.829837 -0.03497
Yoga 0.854333 0.889333 -0.035
FiftyWords 0.804396 0.841758 -0.03736
CBF 0.957778 0.996667 -0.03889
Worms 0.662338 0.701299 -0.03896
UWaveGestureLibraryX 0.807929 0.847292 -0.03936
ACSF1 0.79 0.83 -0.04
ECG200 0.9 0.94 -0.04
UWaveGestureLibraryZ 0.745114 0.785874 -0.04076
ShapesAll 0.848333 0.891667 -0.04333
ProximalPhalanxTW 0.746341 0.790244 -0.0439
MiddlePhalanxOutlineAgeGroup 0.551948 0.597403 -0.04545
WordSynonyms 0.731975 0.782132 -0.05016
WormsTwoClass 0.727273 0.779221 -0.05195
ProximalPhalanxOutlineAgeGroup 0.795122 0.84878 -0.05366
Fish 0.925714 0.982857 -0.05714
SonyAIBORobotSurface2 0.828961 0.886674 -0.05771
ChlorineConcentration 0.580469 0.642969 -0.0625
FreezerSmallTrain 0.682456 0.74807 -0.06561
ShapeletSim 0.883333 0.95 -0.06667
ECGFiveDays 0.894309 0.962834 -0.06852
ScreenType 0.466667 0.536 -0.06933
MiddlePhalanxOutlineCorrect 0.776632 0.85567 -0.07904
Rock 0.72 0.8 -0.08
FreezerRegularTrain 0.910175 0.998246 -0.08807
EOGVerticalSignal 0.475138 0.569061 -0.09392
Adiac 0.680307 0.780051 -0.09974
InlineSkate 0.374545 0.494545 -0.12
LargeKitchenAppliances 0.645333 0.765333 -0.12
ToeSegmentation1 0.846491 0.969298 -0.12281
SmallKitchenAppliances 0.658667 0.784 -0.12533
OSULeaf 0.743802 0.871901 -0.1281
BirdChicken 0.8 0.95 -0.15
Car 0.783333 0.933333 -0.15
TwoLeadECG 0.833187 0.998244 -0.16506
BeetleFly 0.65 0.85 -0.2

MatthewMiddlehurst avatar Jun 02 '25 17:06 MatthewMiddlehurst

The results are not what I expected :( Apart from EAP, everything else was adopted per the paper, so accuracy should have been the same. Will look into the code once more.

itsdivya1309 avatar Jun 03 '25 05:06 itsdivya1309

I have updated to also include just the default train/test split. The paper seems to have changed a big from the arxiv version. We can try asking to authors if necessary.

MatthewMiddlehurst avatar Jun 03 '25 23:06 MatthewMiddlehurst

I have updated to also include just the default train/test split. The paper seems to have changed a big from the arxiv version. We can try asking to authors if necessary.

Yes, you are right, in the published version, they've added HYDRA as well. It wasn't there in the arxiv version.

itsdivya1309 avatar Jun 04 '25 04:06 itsdivya1309

Hi @MatthewMiddlehurst, could you rerun this on the UCR dataset? Have added Minkowski+HYDRA, so it may be slow compared to ProximityForest.

itsdivya1309 avatar Jun 06 '25 13:06 itsdivya1309