detoxify
detoxify copied to clipboard
Pinpoint the parts of the speech that trigger high values
Hi,
Thanks for the work on this library, it's quite accurate!
I'd be awesome if the model could pinpoint the aspect of the input text that triggered a high level (of toxicity or any other measured field).
Is there any easy way to do it already, maybe not for all cases, but for the obvious ones?
Given that it is SENTENCE classification, you can't really "highlight" one part that makes a piece of text "toxic".... The only thing that I can remotely think of is to process each word in a submission individually to find a "toxic" word - but this is really inefficient, and not what the model is suited for, it's not just looking at a word or phrase.....
You can do what I originally did with DeepMoji model (also sentence classification for emotion/sentiment). You do the sentence prediction w/o each word and see the difference in predicted probabilities, see more details here: https://huggingface.co/spaces/Pendrokar/DeepMoji/discussions/1#65eb375cdf813b9c15308c3c