shapash icon indicating copy to clipboard operation
shapash copied to clipboard

Adding plot_shapley_projection function to the plot_evaluation_metrics.py file.

Open Yh-Cherif opened this issue 8 months ago • 6 comments

The original issue for this Pull Request is this one.

This Pull Request adds the plot_shapley_projection function, which correspond to the first objective of the issue (1. Shapley Projection Plot).

Test plan.

To test this code, i tried using it on multiple datasets. The following example use this dataset. The following code :

image

Produces the following result :

image

Description

Generate a 2-dimensional scatter plot of the shapley values of the data and model used in the explainer. This plot allows the user to have a visual interpretation of the impact of variables on predictions. The only library needed is the UMAP library.

Type of change

New feature (non-breaking change which adds functionality or feature that would cause existing functionality to not work as expected)

Test Configuration:

  • OS: Windows
  • Python version: 3.11.9
  • Shapash version: 2.7.9

Yh-Cherif avatar Apr 15 '25 10:04 Yh-Cherif

Thank you so much @Yh-Cherif for this great contribution 🙌
You've achieved a really nice and clean visualization — well done!

I’ve reviewed your work, and here are a few suggestions to help align it more closely with the rest of the library’s design and ensure maximum usability and flexibility:

  • Axes Titles and Values: Since the data is projected into a 2D space (e.g., via UMAP), the axes themselves don’t carry interpretable meaning. For clarity, it might be best to remove the axis titles and tick values, as they could be misleading.

  • Color Bar and Plot Titles: It would be great to make the color bar title (e.g., predictions, targets, errors) and the overall plot title configurable parameters of the function. That way, users can adapt the visualization to suit different use cases.

  • Function Parameters: In plot_evaluation_metrics.py, the functions should be kept as generic as possible. Rather than passing the entire explainer object, it would be better to pass only the specific data needed for the visualization — but with names that reflect their role in the plot. For example, instead of y_pred or contributions, use something like values_to_project (for the 2D projection) and color_values (for the color scale). This helps clarify their purpose, keeps the function flexible regardless of whether it’s classification or regression, and makes it easier to reuse in other contexts.

  • UMAP Dependency: Since UMAP is an external dependency, we’ll need to add it explicitly to the pyproject.toml file so that it's properly tracked and installed.

  • Function Naming: Could you please rename the function from plot_shapley_projection to plot_contributions_projection? This will make it clearer that it can be used with different types of contribution values, not just Shapley (e.g., LIME, etc.).


Next steps:

Once these changes are in place, the next step would be to integrate the function into shapash/explainer/smart_plotter.py, so that users can call it directly from the explainer like:

xpl.contributions_projection_plot()

Eventually, the goal would be to include this visualization in the web app, allowing users to interactively click on each point and inspect the local contributions. That would be incredibly helpful for exploring and understanding individual predictions 👏

Thanks again for your great work and all the time you're putting into this — we really appreciate it!

guillaume-vignal avatar Apr 15 '25 12:04 guillaume-vignal

Hello @guillaume-vignal, thanks for your review.

I've taken account of every of your suggestion and i've updated changes to the files.

I've also realized simulation tests for classification and regression cases to ensure that the function is flexible :

  • For regression i used this dataset and passed it to the function.

image

Here is the output :

image

image

Here is the output :

image

I've added "title" and "colorbar_title" plot options to allow the user to customize a little more the output, as requested.

I hope these changes are what you expected. If not, please let me know and i'll change them.

Thanks again for your patience !

Yh-Cherif avatar Apr 20 '25 19:04 Yh-Cherif

Thanks again for the great work on this feature—it’s a really valuable addition to the library!

The natural next step in this evolution would be to integrate the function into shapash/explainer/smart_plotter.py, so it can be called directly from the explainer like:

xpl.contributions_projection_plot()

This method would simply act as a wrapper around the existing plot_contributions_projection function. It would significantly improve usability by allowing users to access the projection plot with minimal setup.

Embedding it into the explainer would also allow us to handle the logic internally—for example, adapting automatically to regression or classification cases, and selecting color_value based on predictions, targets, or prediction errors, depending on the context or a user-specified argument.

Of course, the current standalone function would still be available for advanced or customized usage:

plot_contributions_projection(
    values_to_project=xpl.contributions,
    color_value=xpl.y_pred,
    random_state=100
)

But exposing it directly through the explainer would really streamline the experience and make it much more user-friendly.

guillaume-vignal avatar Apr 22 '25 08:04 guillaume-vignal

Hello Guillaume. I've added the plot method to the explainer object as you can see.

  • Regression case :

image image

  • Classification case :

image image

PS : Note that i've truncated the code picture since its the same as before.

Yh-Cherif avatar Apr 22 '25 13:04 Yh-Cherif

Thanks for the quick update—the integration is going in a great direction and having the method available directly on the explainer definitely improves usability.

That said, one part of the original suggestion is missing: the ability to choose how the points are colored via a color_value parameter, with options like "prediction", "target", or "error". This enable to make the plot more informative and adapts it to different analysis contexts (e.g., spotting outliers via prediction error).

Also, regarding the use of **kwargs: I don’t think it’s ideal here. Having clearly defined parameters makes the method much more user-friendly and easier to understand—especially for people less familiar with the internal implementation. With explicit arguments, users can quickly see what can be customized and benefit from autocompletion and documentation. **kwargs tends to hide those options and can make things harder to grasp.

guillaume-vignal avatar Apr 22 '25 15:04 guillaume-vignal

Hello @guillaume-vignal, Thanks for your review again. I've included your suggestion into the explainer method and here is an example of the changes :

  • I've incorporated the predictions/targets/errors option for both classification and regression (options must be passed on the 'color_value' option) : image

Note that title and colorbar title are both filled automatically.

  • Made sure that each function key-word arguments are compatible with auto-completion :
    image

  • I've also corrected the 'example' part of the function description (that still displayed an example with the explainer).

Yh-Cherif avatar Apr 29 '25 12:04 Yh-Cherif