decision-tree-id3
decision-tree-id3 copied to clipboard
differentiate empty leaves from heterogeneous leaves
For some new data in text_plot_examples.py:
X = np.array([[45, "male", "private", "m"],
[61, "other", "public", "b"],
[60, "other", "public", "b"],
[40, "male", "private", "none"],
[34, "female", "private", "none"],
[43, "other", "private", "m"],
[35, "male", "private", "m"],
[35, "male", "public", "m"],
[34, "other", "public", "m"],
[34, "female", "public", "b"],
[34, "male", "public", "b"],
[34, "female", "private", "b"],
[34, "male", "private", "b"],
[34, "other", "private", "b"]])
y = np.array(["(30k,38k)",
"(30k,38k)",
"(13k,15k)",
"(13k,15k)",
"(13k,15k)",
"(23k,30k)",
"(23k,30k)",
"(15k,23k)",
"(23k,30k)",
"(15k,23k)",
"(15k,23k)",
"(23k,30k)",
"(23k,30k)",
"(23k,30k)"])
the output without the fix is:
| degree b
| | sector private: (23k,30k) (3)
| | sector public: (15k,23k) (2)
| degree m
| | gender female: (23k,30k) (3/1)
| | gender male
| | | sector private: (23k,30k) (1)
| | | sector public: (15k,23k) (1)
| | gender other: (23k,30k) (2)
| degree none: (13k,15k) (2)
age >44.00
| gender female: (30k,38k) (2/1)
| gender male: (30k,38k) (1)
| gender other: (13k,15k) (1/1)
For the branch age>44.00 and gender female, there shouldn't be anything since this combination doesn't exist in the data and the representation is misleading because it's the same as an heterogeneous leaf (for e.g branch age>44.00 and gender other)