dtreeviz icon indicating copy to clipboard operation
dtreeviz copied to clipboard

labels too close to the axes overlap for classifiers

Open parrt opened this issue 5 years ago • 3 comments

We have this working for regression trees but not classifiers.

parrt avatar Dec 02 '20 20:12 parrt

@parrt , is this the kind of thing you're talking about? Cut points are overlapping with axis bounds. image

JohnM-H2O avatar Feb 17 '21 23:02 JohnM-H2O

hahah. yep. that's a good example. thanks!!! Any chance you can send a tiny bit of code we can run as a test for that?

parrt avatar Feb 17 '21 23:02 parrt

Yes, sure. Thanks for testing!

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from dtreeviz import trees

X = np.tile((0, 99, 100), 100).reshape(-1, 1)
y = np.tile((0, 0, 1), 100)

clf = DecisionTreeClassifier(random_state=22)
clf.fit(X, y)
trees.dtreeviz(clf, X, y, class_names=['0', '1'])

Screenshot 2021-02-18 at 12 56 30

JohnM-H2O avatar Feb 18 '21 18:02 JohnM-H2O

@parrt we could try moving the wedge down, but we'd also need to move the feature name / x-axis label.

mepland avatar Dec 30 '22 02:12 mepland

I wonder what we do for regression. I think I simply move to the label away from the wedge. Ah:

if node.split() > xmin + .10 * xr and node.split() < xmax - .1 * xr:  # don't show split if too close to axis ends
    xticks += [node.split()]

parrt avatar Dec 30 '22 22:12 parrt

@parrt Hmm that just turns the wedge off, not an ideal solution. I think we should give the user options:

  1. The current behavior, you may have overlaps.
  2. Hide wedge if split is too close to axis limits as in the regression.
  3. Move all of the wedges and x-axis titles down by enough vertical distance that overlaps are not possible.

Let me know if you'd like help coding any of this up.

mepland avatar Dec 31 '22 18:12 mepland

Hmm...i'd like to keep vertical space small as possible so (3) not great solution, unless we decide to only do that when we'd overlap.

Options are nice, but they double or triple the testing space. :(

I think for now we can just do (2).

parrt avatar Dec 31 '22 23:12 parrt

I wonder what we do for regression. I think I simply move to the label away from the wedge. Ah:

if node.split() > xmin + .10 * xr and node.split() < xmax - .1 * xr:  # don't show split if too close to axis ends
    xticks += [node.split()]

@parrt actually that code just controls an extra tick on the x-axis at the split location. The wedge at the split and ticks at the beginning and end of the axis are drawn regardless, see xticks = list(overall_feature_range) and wedge(ax, node.split(), color=colors['wedge']).

    if not node.is_categorical_split():
        **xticks = list(overall_feature_range)**
        if node.split() > xmin + .10 * xr and node.split() < xmax - .1 * xr:  # don't show split if too close to axis ends
            xticks += [node.split()]
        ax.set_xticks(xticks)

        ax.scatter(X_feature, y_train, s=5, c=colors['scatter_marker'], alpha=colors['scatter_marker_alpha'], lw=.3)
        left, right = node.split_samples()
        left = y_train[left]
        right = y_train[right]
        split = node.split()

        ax.plot([overall_feature_range[0], split], [np.mean(left), np.mean(left)], '--', color=colors['split_line'],
                linewidth=1)
        ax.plot([split, split], [*y_range], '--', color=colors['split_line'], linewidth=1)
        ax.plot([split, overall_feature_range[1]], [np.mean(right), np.mean(right)], '--', color=colors['split_line'],
                linewidth=1)
        **wedge(ax, node.split(), color=colors['wedge'])**

Would you like me to make a PR addressing this for both classification and regression? I can turn off the offending tick at the beginning / end of the axis if the split is too close. It may also be a good chance to combine both wedge() functions into a single _draw_wedge() utility function in utils.py, with a flag for classification or regression.

mepland avatar Jan 04 '23 23:01 mepland

  1. It definitely makes sense to combine that functionality to generate a wedge.
  2. What if we moved the wedges below the axis but only if they would collide or get too close to the left or right edge?

parrt avatar Jan 05 '23 22:01 parrt

  1. It definitely makes sense to combine that functionality to generate a wedge.

Sounds good, will do.

  1. What if we moved the wedges below the axis but only if they would collide or get too close to the left or right edge?

hmm then the heights / aspect ratios of the subplots included in the graph would probably be off for those nodes. I think it is okay ditching one of the axis endpoint labels if the wedge is within 10% of an end, as the wedge also has a numerical value. Plus that keeps everything else on the plot consistent, and is easy to implement :smile:

mepland avatar Jan 05 '23 22:01 mepland

Yes I think your approach is good. We drop the axis label and keep the label on the wedge itself if too close. Seems like the simplest solution

parrt avatar Jan 05 '23 22:01 parrt

@parrt actually, looking at the wedge plotting code, only classification tree non-categorical splits get a text label under the wedge. I've started refactoring the wedge plotting code in https://github.com/parrt/dtreeviz/pull/236 here, which makes it clearer.

Would you like to add the text label to the other wedge types as well? Or keep the text label behavior as is, and still blank out one of the axis end point tick labels?

mepland avatar Jan 05 '23 23:01 mepland

It seems we should also be doing the text label under the split for regression trees. I think we do Screen Shot 2023-01-05 at 4 02 49 PM

parrt avatar Jan 06 '23 00:01 parrt

@parrt I think that is a tick label, not an extra text object like is done for the classification trees:

            ax.text(tip_x, tip_y -2 * tri_height,
                    f"{myround(x, precision)}",
                    horizontalalignment='center',
                    fontsize=ticks_fontsize,
                    fontname=fontname,
                    color=color)

mepland avatar Jan 06 '23 02:01 mepland

@parrt see my comments on #236

mepland avatar Jan 06 '23 02:01 mepland

Resolved by https://github.com/parrt/dtreeviz/pull/236

mepland avatar Jan 14 '23 23:01 mepland