giskard icon indicating copy to clipboard operation
giskard copied to clipboard

Scan: Add a robustness detector to the scan that perturbs categorial values

Open kevinmessiaen opened this issue 3 months ago • 7 comments

🚀 Feature Request

Add a robustness detector to the scan that perturbs categorial values.

The scan should be able to a set of issues that capture the perturbations needed on a single categorial feature to:

(a) change the predicted label (classification) (b) change the prediction by an amount that exceeds a certain threshold (regression)

🔈 Motivation

Currently the scan does not have any categorial perturbation.

kevinmessiaen avatar Mar 14 '24 08:03 kevinmessiaen

Is this issue still active ? I would want to contribute to this issue

ChatBear avatar Apr 10 '24 15:04 ChatBear

@kevinmessiaen I let you guide there, this seems easy to add, and a great idea of contribution!

alexcombessie avatar Apr 10 '24 15:04 alexcombessie

Hello @ChatBear

Yes this is still an active issue, I can assign you to it. We would be grateful to have your contribution, let me know if you have question about this.

kevinmessiaen avatar Apr 11 '24 03:04 kevinmessiaen

Thanks, i'll try to contribute, i'll need a bit of time to understand the repo, after that i'll try to post PR

ChatBear avatar Apr 11 '24 07:04 ChatBear

Hello, i have few questions about the issue.

What kind of pertubations do you except ? I was thinking of change the feature column with a probability of 0.1 (chosen arbitrary).

And do i need the create another detector from scratch, or i can use a detector from BaseTextPerturbationDetector ?

And i tried to create a branch, and i can't push in my own branch (i forked the repo but i am having trouble to create the pull request, i am kinda of new in open source so i apologize in advance if this question is inappropriate).

ChatBear avatar Apr 14 '24 17:04 ChatBear

Hello,

The perturbation should be on categorical feature. It should only perturb on column of the dataset, the goal is to ensure that the model isn't too sensitive to noise. In this case the probability is not necessary since we want to test that the result isn't impacted when the value change. (it makes sense in text where we have typo rate for example).

Example is having a breed category with values potential values: ['Labrador', 'Husky', 'Beagle', ...]. The idea is to switch all Labrador` value to any other breed and so on.

It won't work to reuse BaseTextPerturbationDetector since it cast column as str but we can have numerical categories for example. But you can inspire from it.

kevinmessiaen avatar Apr 15 '24 07:04 kevinmessiaen

Ok, thanks i can continue

ChatBear avatar Apr 15 '24 13:04 ChatBear