llm-guard Toxicity Scanner to return the type of content

Toxicity Scanner to return the type of content

Open RQledotai opened this issue 11 months ago • 1 comments

When using the input or output toxicity scanner, it would be preferrable to return the type of label (e.g. sexual_explicit) instead of the offensive content. It would enable applications to communicate the issue.

Mar 18 '24 20:03 RQledotai

Hey @RQledotai , thanks for reaching out. Apologies for the delay.

I agree, and such refactoring is in works to actually return an object with more context about the reason behind blocking. Currently, the only way to monitor is logs.

Mar 22 '24 08:03 asofter

llm-guard llm-guard copied to clipboard

Toxicity Scanner to return the type of content

llm-guard
llm-guard copied to clipboard