responsible-ai-toolbox icon indicating copy to clipboard operation
responsible-ai-toolbox copied to clipboard

Sort individual feature importance by residual for regression

Open tongyu-microsoft opened this issue 3 years ago • 8 comments
trafficstars

This PR orders the samples in individual feature importance by abs(true_y - predicted_y) for regression. It will show the predictions which are better first than the prediction which are way off in the model.

Description

Before: sort by index image

After: sort by abs(true_y - predicted_y) image

Checklist

  • [x] I have added screenshots above for all UI changes.
  • [ ] I have added e2e tests for all UI changes.
  • [ ] Documentation was updated if it was needed.

tongyu-microsoft avatar Jun 10 '22 23:06 tongyu-microsoft

Codecov Report

Merging #1487 (9342c46) into main (9342c46) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 9342c46 differs from pull request most recent head 45455bf. Consider uploading reports for the commit 45455bf to get more accurate results

@@           Coverage Diff           @@
##             main    #1487   +/-   ##
=======================================
  Coverage   89.30%   89.30%           
=======================================
  Files          38       38           
  Lines        1617     1617           
=======================================
  Hits         1444     1444           
  Misses        173      173           
Flag Coverage Δ
unittests 89.30% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9342c46...45455bf. Read the comment docs.

codecov-commenter avatar Jun 11 '22 00:06 codecov-commenter

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/tongy/sortByResidual/dashboard/index.html

github-actions[bot] avatar Jun 11 '22 00:06 github-actions[bot]

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/tongy/sortByResidual/dashboard/index.html

github-actions[bot] avatar Jun 11 '22 01:06 github-actions[bot]

@tongyu-microsoft I'm curious what prompted this change. Was this a feature request from somebody? As a user, I'd be a bit surprised about the ordering since we had it by index so far and that's immediately obvious. The residual ordering is not immediately obvious unless we point it out in writing. Perhaps that should be added? If you haven't yet you may want to consult with a designer 🙂

romanlutz avatar Jun 13 '22 14:06 romanlutz

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/tongy/sortByResidual/dashboard/index.html

github-actions[bot] avatar Jun 13 '22 18:06 github-actions[bot]

@tongyu-microsoft I'm curious what prompted this change. Was this a feature request from somebody? As a user, I'd be a bit surprised about the ordering since we had it by index so far and that's immediately obvious. The residual ordering is not immediately obvious unless we point it out in writing. Perhaps that should be added? If you haven't yet you may want to consult with a designer 🙂

@romanlutz, I think sorting based on some regression metric might be useful for understanding as to why model might be predicting with higher accuracy for some samples and doesn't predict the same way for other samples. For classification scenario as well, we differentiate between which samples were correctly predicted by model vs which were not correctly predicted. Agree, that we should document this in the dashboard and should use some standard regression metric like r2_score to order these.

gaugup avatar Jun 14 '22 01:06 gaugup

I think it would also be nice to see the values by which these are sorted as a separate column. If they are sorted by abs(true_y - predicted_y), it might be nice to have a columns with name something like "Abs difference" right after the index, so it's clear that this is what the data is sorted by. It doesn't necessarily have to be part of this PR though.

imatiach-msft avatar Jun 24 '22 15:06 imatiach-msft

I think it would also be nice to see the values by which these are sorted as a separate column. If they are sorted by abs(true_y - predicted_y), it might be nice to have a columns with name something like "Abs difference" right after the index, so it's clear that this is what the data is sorted by. It doesn't necessarily have to be part of this PR though.

@imatiach-msft Thanks for the great comment! Yeah this PR is on hold and we are waiting for Owen's design on this, so that we can have different sorting options for users, not limited to Abs difference :)

tongyu-microsoft avatar Jun 24 '22 18:06 tongyu-microsoft