desbordante-core icon indicating copy to clipboard operation
desbordante-core copied to clipboard

Refactor TANE-based algorithms

Open iliya-b opened this issue 3 months ago • 1 comments

Generalize Tane and PFDTane, add additional tests.

In order to check if the refactoring caused any performance loss, following experiments were performed. The discovery task was run as cli.py --task=afd --algo=tane --error=0.05 --table=... with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.

Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):

  1. EpicMeds.csv (old) 59.715925465099986 +- 0.1869874511220996
  2. EpicMeds.csv (new) 59.5840122977 +- 0.06763601341304505
  3. adult.csv (old) 24.654166058699996 +- 0.06323832294394492
  4. adult.csv (new) 24.76226707977778 +- 0.09297212157319155
  5. EpicVitals.csv (old) 10.6707755998 +- 0.11612311140862534
  6. EpicVitals.csv (new) 10.7569084586 +- 0.0103879548810794

iliya-b avatar Mar 22 '24 22:03 iliya-b