DecisionTrees icon indicating copy to clipboard operation
DecisionTrees copied to clipboard

Major Refactor: Code Modernization, New Features, and Documentation

Open jesper-olsen opened this issue 3 months ago • 0 comments

Hi Michael,

Thank you for creating this excellent educational repository for decision trees.

As I was working with the code, I made some updates to modernise it, improve its usability, and add proper documentation. I wanted to offer these changes back to the original project in case they are useful to you or to future learners who find this repo.

Here is a summary of the improvements:

Code Modernization & Refactoring:

  • Refactored uniqueCounts, entropy, and gini to use the more efficient and Pythonic collections.Counter and math.log2.
  • Added guard clauses to entropy and gini to prevent ZeroDivisionError.
  • Improved overall code formatting, readability, and added docstrings to all major functions.

New Features & Usability:

  • Command-Line Interface: Integrated argparse to allow users to select the dataset and criterion (gini/entropy) from the command line.
  • Graphviz Plotting: Added a --plot option to generate a tree visualisation and save it to file (e.g. .png format).
  • Improved Classification Output: Created a dedicated function to print classification results in a clear, human-readable, and deterministic format.

Project Structure & Documentation:

  • New README: Added a comprehensive README.md with instructions on prerequisites, installation, and usage.
  • Better File Naming: Renamed the main script from implementation.py to decision_tree.py to follow standard Python conventions.
  • Organized Directory Structure: Moved data files (.csv) into a data/ directory to separate code from data.

No pressure to merge this, as I know the project is not actively maintained, but I wanted to offer these improvements back. Thanks again for the great resource!

jesper-olsen avatar Sep 05 '25 12:09 jesper-olsen