detoxify Pinpoint the parts of the speech that trigger high values

Pinpoint the parts of the speech that trigger high values

Open nicobao opened this issue 11 months ago • 2 comments

Hi,

Thanks for the work on this library, it's quite accurate!

I'd be awesome if the model could pinpoint the aspect of the input text that triggered a high level (of toxicity or any other measured field).

Is there any easy way to do it already, maybe not for all cases, but for the obvious ones?

Mar 07 '24 23:03 nicobao

Given that it is SENTENCE classification, you can't really "highlight" one part that makes a piece of text "toxic".... The only thing that I can remotely think of is to process each word in a submission individually to find a "toxic" word - but this is really inefficient, and not what the model is suited for, it's not just looking at a word or phrase.....

Aug 01 '24 22:08 voarsh2

You can do what I originally did with DeepMoji model (also sentence classification for emotion/sentiment). You do the sentence prediction w/o each word and see the difference in predicted probabilities, see more details here: https://huggingface.co/spaces/Pendrokar/DeepMoji/discussions/1#65eb375cdf813b9c15308c3c

Aug 02 '24 08:08 bfelbo

detoxify detoxify copied to clipboard

Pinpoint the parts of the speech that trigger high values

detoxify
detoxify copied to clipboard