decision-tree-id3 icon indicating copy to clipboard operation
decision-tree-id3 copied to clipboard

differentiate empty leaves from heterogeneous leaves

Open marcduda opened this issue 6 years ago • 0 comments

For some new data in text_plot_examples.py:

X = np.array([[45, "male", "private", "m"],
              [61, "other", "public", "b"],
              [60, "other", "public", "b"],
              [40, "male", "private", "none"],
              [34, "female", "private", "none"],
              [43, "other", "private", "m"],
              [35, "male", "private", "m"],
              [35, "male", "public", "m"],
              [34, "other", "public", "m"],
              [34, "female", "public", "b"],
              [34, "male", "public", "b"],
              [34, "female", "private", "b"],
              [34, "male", "private", "b"],
              [34, "other", "private", "b"]])

y = np.array(["(30k,38k)",
              "(30k,38k)",
              "(13k,15k)",
              "(13k,15k)",
              "(13k,15k)",
              "(23k,30k)",
              "(23k,30k)",
              "(15k,23k)",
              "(23k,30k)",
              "(15k,23k)",
              "(15k,23k)",
              "(23k,30k)",
              "(23k,30k)",
              "(23k,30k)"])

the output without the fix is:

|   degree b
|   |   sector private: (23k,30k) (3) 
|   |   sector public: (15k,23k) (2) 
|   degree m
|   |   gender female: (23k,30k) (3/1) 
|   |   gender male
|   |   |   sector private: (23k,30k) (1) 
|   |   |   sector public: (15k,23k) (1) 
|   |   gender other: (23k,30k) (2) 
|   degree none: (13k,15k) (2) 
age >44.00
|   gender female: (30k,38k) (2/1) 
|   gender male: (30k,38k) (1) 
|   gender other: (13k,15k) (1/1)

For the branch age>44.00 and gender female, there shouldn't be anything since this combination doesn't exist in the data and the representation is misleading because it's the same as an heterogeneous leaf (for e.g branch age>44.00 and gender other)

marcduda avatar Apr 11 '20 15:04 marcduda