DecisionTrees
DecisionTrees copied to clipboard
Major Refactor: Code Modernization, New Features, and Documentation
Hi Michael,
Thank you for creating this excellent educational repository for decision trees.
As I was working with the code, I made some updates to modernise it, improve its usability, and add proper documentation. I wanted to offer these changes back to the original project in case they are useful to you or to future learners who find this repo.
Here is a summary of the improvements:
Code Modernization & Refactoring:
- Refactored uniqueCounts, entropy, and gini to use the more efficient and Pythonic collections.Counter and math.log2.
- Added guard clauses to entropy and gini to prevent ZeroDivisionError.
- Improved overall code formatting, readability, and added docstrings to all major functions.
New Features & Usability:
- Command-Line Interface: Integrated argparse to allow users to select the dataset and criterion (gini/entropy) from the command line.
- Graphviz Plotting: Added a --plot option to generate a tree visualisation and save it to file (e.g. .png format).
- Improved Classification Output: Created a dedicated function to print classification results in a clear, human-readable, and deterministic format.
Project Structure & Documentation:
- New README: Added a comprehensive README.md with instructions on prerequisites, installation, and usage.
- Better File Naming: Renamed the main script from implementation.py to decision_tree.py to follow standard Python conventions.
- Organized Directory Structure: Moved data files (.csv) into a data/ directory to separate code from data.
No pressure to merge this, as I know the project is not actively maintained, but I wanted to offer these improvements back. Thanks again for the great resource!