Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Store Message Toxicity in database

Open nil-andreu opened this issue 1 year ago • 1 comments

Implementing the calculation of the message toxicity in the workflow as well as storing its value in the database.

nil-andreu avatar Jan 08 '23 20:01 nil-andreu

Cool stuff! Am curious and have some questions:

  • Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?
  • Do they add much in terms of resource overhead on the backend?
  • Do they add any latency or complexity that could affect to user experience and flow?
  • Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

Mainly I am wondering if/why this needs to happen within the app and not as some sort of regular batch job so we can have more separation of concerns.

I am not super familiar with the backend or anything so asking out of ignorance and curiosity and a little bit as devil's advocate but with the best intentions :)

andrewm4894 avatar Jan 09 '23 21:01 andrewm4894

  • Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?

Not concrete plans, but the idea is that a (trusted) frontend could check dynamically whether some input violates the classifier.

  • Do they add much in terms of resource overhead on the backend?

Not really, beyond an open socket.

  • Do they add any latency or complexity that could affect to user experience and flow?

Maybe, we'll have to see.

  • Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

This would only be for real-time inference. I think could still do batch computation for all stored things.

yk avatar Jan 10 '23 21:01 yk

  • Do we have plans to use the toxicity or message embeddings within the app such that we need them right away?

Not concrete plans, but the idea is that a (trusted) frontend could check dynamically whether some input violates the classifier.

  • Do they add much in terms of resource overhead on the backend?

Not really, beyond an open socket.

  • Do they add any latency or complexity that could affect to user experience and flow?

Maybe, we'll have to see.

  • Any cost considerations with huggingface api in terms of scale and streaming vs batch usage of thier api?

This would only be for real-time inference. I think could still do batch computation for all stored things.

Maybe batch processing for the messages that we were not able to obtain the toxicity score. This is something I could work on it after this PR.

nil-andreu avatar Jan 10 '23 21:01 nil-andreu

Have made couple final changes, tomorrow will test them and make sure it works correctly.

nil-andreu avatar Jan 10 '23 21:01 nil-andreu

Have changed the code based on the feedback. If there needs to be changed anything let me know! @yk

nil-andreu avatar Jan 11 '23 20:01 nil-andreu