example-repos-dev icon indicating copy to clipboard operation
example-repos-dev copied to clipboard

get-started: bar plot for feature importance

Open dberenbaum opened this issue 3 years ago • 6 comments

Bar charts were added in https://github.com/iterative/dvc-render/issues/8. Should we switch the feature importance plot from image to bar plot? I'm not sure it's worth it since then we will have no static image plots.

dberenbaum avatar Sep 15 '22 14:09 dberenbaum

Yes, may be can come with some other plots that can be reasonable for this workflow instead of removing the feature importance. Anything that comes to your mind @dberenbaum @daavoo ?

At the end it would be nice to have more plots I think.

shcheklein avatar Sep 15 '22 21:09 shcheklein

We could plot the distribution of samples across target labels (0, 1) and/or splits (train, test). Those are usually represented as bar plots and would be associated with a different stage (prepare?)

daavoo avatar Sep 16 '22 06:09 daavoo

I would prefer to convert feature importance to a bar plot since we have support for it, and then add another image plot.

One idea is a SHAP summary plot, which is a more robust feature importance method:

image

It doesn't hurt to also keep the traditional feature importance as a bar plot since all of these methods have pros and cons, and it's can help to look at more than one method.

dberenbaum avatar Sep 16 '22 14:09 dberenbaum

Okay, sounds good, we can try both. I like @daavoo 's suggestion since it's way simpler. I would add another image too though, I think it's good to have more images.

Let's take this when we are done with the global/flexible plots iteration and https://github.com/iterative/example-repos-dev/pull/117 is merged?

shcheklein avatar Sep 16 '22 17:09 shcheklein

We can prioritize this in docs planning that @jorgeorpinel is preparing as a task that one of the people from the bigger "docs" group can take (including me, I would be happy to do this).

shcheklein avatar Sep 16 '22 17:09 shcheklein

I started on the SHAP one in https://github.com/iterative/example-repos-dev/pull/136, so anyone can feel free to pick up from there. There's a SHAP package, so it's not difficult to add.

Having some sample distribution plot is a good idea, although I have a couple concerns about the suggested bar plot:

  1. It won't change between experiments.
  2. Since the data is binary, a bar plot is likely not as interesting or realistic as the others. I don't think these are blockers, but maybe if someone works on these, they can try to find ways to make it more interesting.

A histogram of predictions from training and test data might be another good bar plot.

dberenbaum avatar Sep 18 '22 18:09 dberenbaum