C45algorithm icon indicating copy to clipboard operation
C45algorithm copied to clipboard

Fix for speeding up the execution

Open BouncyButton opened this issue 2 years ago • 0 comments

Hi! Thank you for sharing your implementation of C4.5 (over 10 years ago!!)

I just wanted to point out a simple fix that can improve by order of magnitudes the runtime of the algorithm, in some datasets. I hope this will be useful for people like me that will find this repo in the future.

return {k: [v[i] for i in range(len(v)) if i in ind] for k, v in t.items()}

Here, it is useless to cycle over range and ind. You can simplify as follows:

return {k: [v[i] for i in ind] for k, v in t.items()}

On my machine, running the updated version on the mushroom UCI dataset takes <1s, while before it took about 50 seconds.

I plan to release a repository that will include your updated code. (hoping that's ok!) Thanks!

BouncyButton avatar Apr 24 '23 11:04 BouncyButton