Open-Assistant Store Message Toxicity in database

Implementing the calculation of the message toxicity in the workflow as well as storing its value in the database.

Jan 08 '23 20:01 nil-andreu

Cool stuff! Am curious and have some questions:

Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?
Do they add much in terms of resource overhead on the backend?
Do they add any latency or complexity that could affect to user experience and flow?
Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

Mainly I am wondering if/why this needs to happen within the app and not as some sort of regular batch job so we can have more separation of concerns.

I am not super familiar with the backend or anything so asking out of ignorance and curiosity and a little bit as devil's advocate but with the best intentions :)

Jan 09 '23 21:01 andrewm4894

Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?

Not concrete plans, but the idea is that a (trusted) frontend could check dynamically whether some input violates the classifier.

Do they add much in terms of resource overhead on the backend?

Not really, beyond an open socket.

Do they add any latency or complexity that could affect to user experience and flow?

Maybe, we'll have to see.

Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

This would only be for real-time inference. I think could still do batch computation for all stored things.

Jan 10 '23 21:01 yk

Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?

Not concrete plans, but the idea is that a (trusted) frontend could check dynamically whether some input violates the classifier.

Do they add much in terms of resource overhead on the backend?

Not really, beyond an open socket.

Do they add any latency or complexity that could affect to user experience and flow?

Maybe, we'll have to see.

Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

This would only be for real-time inference. I think could still do batch computation for all stored things.

Maybe batch processing for the messages that we were not able to obtain the toxicity score. This is something I could work on it after this PR.

Jan 10 '23 21:01 nil-andreu

Have made couple final changes, tomorrow will test them and make sure it works correctly.

Jan 10 '23 21:01 nil-andreu

Have changed the code based on the feedback. If there needs to be changed anything let me know! @yk

Jan 11 '23 20:01 nil-andreu

Open-Assistant Open-Assistant copied to clipboard

Store Message Toxicity in database

Open-Assistant
Open-Assistant copied to clipboard