diffusers
diffusers copied to clipboard
What are the original NSFW concepts used in the safety checker?
It looks like the safety checker uses some NSFW concept embeddings generated from CLIP to filter out unsafe content. I wonder if we can get the original concepts in text instead of in CLIP embeddings?
Because I want to know if those concepts already cover what I want to filter, if not, I can generate my extra NSFW concepts embeddings from CLIP and use them in the safety checker.
Thanks in advance. This is a very very cool project! 😃
I'm not sure if this is what you're looking for, but here's how to disable CompVis's stable diffusion safety models.
@TomPham97 thanks for the your suggestion. I took a look at your link, but I am not searching for how to disable it. What I want instead is to know the original NSFW concepts (in text) they are using in the safety checker. So I can add more concepts to match based on my needs
Personally I'd be fine with exposing the names by now but we'd have to sync with Stability AI on this one. Will ask! cc @mmitchellai @yjernite @apolinario here
Agree this would be good to know; inquiring with StabilityAI/CompVis if it makes sense to share.
Also cc @natolambert here :-)
Hi! We have just released a paper in which we analyse the filter implementation, reverse-engineer the unsafe concepts, and describe important limitations. You can read it here.
After releasing it, someone pointed out (see here) that the concepts were actually disclosed in an unreferenced repository from LAION. However, they are not exactly equivalent. Note that LAION uses 5 special concepts, but SD only considers 3 of those.
Additionally, we released a Colab notebook to test the safety filter on any given image. This can hopefully help the community identify failure modes that might be useful for future filters.
@javirandor Wow it's quite an effort. Thank you for sharing this information! It will help when I need to design my own filters. (especially useful on the false negative cases!)
BTW the concepts have also been published here: https://github.com/LAION-AI/CLIP-based-NSFW-Detector/blob/main/safety_settings.yml
Sorry we've been a bit late here with repyling. In general, we're open for any PRs to add more descriptions about the safety filter. We've added a description about the safety checker here: https://huggingface.co/CompVis/stable-diffusion-v1-4#safety-module - please open a PR or issue if you'd like to put more information there :-)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.