Open-Assistant Process of Detoxify Model Classification

Now we are obtaining the inference from huggingface of the Detoxify Roberta based on a message, but I am wondering how this will be included in the workflow:

this inference call is made in the backend when we want to insert a new message?
if this new message that we want to insert is 'toxic', we want to reply with a certain Response?
should this classification of the message be stored in the table of Messages?

I am open to discuss and include this features if needed in the code.

Jan 05 '23 18:01 nil-andreu

Are you thinking this is something we could use at runtime to redirect the message to a different pipeline if it's toxic? A longer term use that might be to use the formulaic pathway to generate training data on how to respond to toxic inputs. So we would have the short term goal of generating the formulaic approach, and the long term goal of using it to bake non-toxicity into the model.

Jan 05 '23 20:01 smytjf11

We are working on some zero shot classifiers, and detoxify as well. But all this has to be based on a policy of what we will do with the stuff we detect and what safety measure we put in place where - at data collection, training or inference. Please ping me in discord @Nil-Andreu and I will add you to the safety discussion.

Jan 05 '23 23:01 huu4ontocord

this inference call is made in the backend when we want to insert a new message?

Yes, we want to query the model & store the detoxify-results in the DB for each user-submitted message.

if this new message that we want to insert is 'toxic', we want to reply with a certain Response?

It would be a nice to have feature to give immediate feedback to the user. The protocol will have to be updated for this, or the front-ends could post the message to the backend ina step before the actual submission for a quick check.

should this classification of the message be stored in the table of Messages?

Yes.

(It is on my todo list to write an issue for this, if you want to work on it please consider joining the OA-discord server and ping me .. https://discord.gg/HFCPfugy)

Jan 06 '23 07:01 andreaskoepf

We are working on some zero shot classifiers, and detoxify as well. But all this has to be based on a policy of what we will do with the stuff we detect and what safety measure we put in place where - at data collection, training or inference. Please ping me in discord @Nil-Andreu and I will add you to the safety discussion.

Okay thanks! I am not finding your profile in Discord, if could please add me, my profile is: Nilandreug.

Jan 06 '23 08:01 nil-andreu

this inference call is made in the backend when we want to insert a new message?

Yes, we want to query the model & store the detoxify-results in the DB for each user-submitted message.

if this new message that we want to insert is 'toxic', we want to reply with a certain Response?

It would be a nice to have feature to give immediate feedback to the user. The protocol will have to be updated for this, or the front-ends could post the message to the backend ina step before the actual submission for a quick check.

should this classification of the message be stored in the table of Messages?

Yes.

(It is on my todo list to write an issue for this, if you want to work on it please consider joining the OA-discord server and ping me .. https://discord.gg/HFCPfugy)

Okay seems interesting! Have ping you! I am now working on the creation of mock data and displaying the messages. But it should not take too long to implement the points 1. and 3., which seems to be the most clear to me. About 2., yeah we should discuss definitely what should be done if we find a toxic message.

Jan 06 '23 08:01 nil-andreu

Open-Assistant Open-Assistant copied to clipboard

Process of Detoxify Model Classification

Open-Assistant
Open-Assistant copied to clipboard