detoxify Weird behavior of Smaller and Larger Models for same Text

Weird behavior of Smaller and Larger Models for same Text

Open LaxmanSinghTomar opened this issue 3 years ago • 1 comments

Hey! Thanks for this easy to get started package. I was testing both original and unbiased model on following sentences:

doc_1 = "I don't know why people don't support Muslims and call them terrorists often. They are not." doc_2 = "There is nothing wrong being in a lesbian. Everyone has feelings."

Following are the toxicity scores by them:

model_testing

The original model which is supposed to be biased is predicting doc_1 to be non-toxic as it should while the unbiased-smaller model predicts it to be toxic.

Likewise, for doc_2, the prediction should be non-toxic in ideal scenario and the original model(both smaller and larger) being biased should predict it toxic. This is what it does:

model_testing_2

Original smaller one predicts toxic while the larger one does not. Can you explain what might be causing different behavior for same text in smaller and larger models in case of both original and unbiased models here?

Feb 04 '22 16:02 LaxmanSinghTomar

Hello, sorry for the late reply and thank you for this observation!

It is hard to draw any meaningful conclusions based on a few examples, but I would imagine the difference in the smaller and larger models is due to the reduced capacity of the smaller models to learn more difficult examples as with the case of sentence negation.

Apr 12 '22 18:04 laurahanu

detoxify detoxify copied to clipboard

Weird behavior of Smaller and Larger Models for same Text

detoxify
detoxify copied to clipboard