desbordante-core
desbordante-core copied to clipboard
Refactor TANE-based algorithms
Generalize Tane and PFDTane, add additional tests.
In order to check if the refactoring caused any performance loss, following experiments were performed.
The discovery task was run as cli.py --task=afd --algo=tane --error=0.05 --table=...
with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.
Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):
- EpicMeds.csv (old) 59.715925465099986 +- 0.1869874511220996
- EpicMeds.csv (new) 59.5840122977 +- 0.06763601341304505
- adult.csv (old) 24.654166058699996 +- 0.06323832294394492
- adult.csv (new) 24.76226707977778 +- 0.09297212157319155
- EpicVitals.csv (old) 10.6707755998 +- 0.11612311140862534
- EpicVitals.csv (new) 10.7569084586 +- 0.0103879548810794