diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

What are the original NSFW concepts used in the safety checker?

Open Nash2325138 opened this issue 2 years ago • 8 comments

It looks like the safety checker uses some NSFW concept embeddings generated from CLIP to filter out unsafe content. I wonder if we can get the original concepts in text instead of in CLIP embeddings?

Because I want to know if those concepts already cover what I want to filter, if not, I can generate my extra NSFW concepts embeddings from CLIP and use them in the safety checker.

Thanks in advance. This is a very very cool project! 😃

Nash2325138 avatar Sep 12 '22 10:09 Nash2325138

I'm not sure if this is what you're looking for, but here's how to disable CompVis's stable diffusion safety models.

TomPham97 avatar Sep 12 '22 21:09 TomPham97

@TomPham97 thanks for the your suggestion. I took a look at your link, but I am not searching for how to disable it. What I want instead is to know the original NSFW concepts (in text) they are using in the safety checker. So I can add more concepts to match based on my needs

Nash2325138 avatar Sep 13 '22 07:09 Nash2325138

Personally I'd be fine with exposing the names by now but we'd have to sync with Stability AI on this one. Will ask! cc @mmitchellai @yjernite @apolinario here

patrickvonplaten avatar Sep 13 '22 16:09 patrickvonplaten

Agree this would be good to know; inquiring with StabilityAI/CompVis if it makes sense to share.

meg-huggingface avatar Sep 13 '22 18:09 meg-huggingface

Also cc @natolambert here :-)

patrickvonplaten avatar Sep 17 '22 12:09 patrickvonplaten

Hi! We have just released a paper in which we analyse the filter implementation, reverse-engineer the unsafe concepts, and describe important limitations. You can read it here.

After releasing it, someone pointed out (see here) that the concepts were actually disclosed in an unreferenced repository from LAION. However, they are not exactly equivalent. Note that LAION uses 5 special concepts, but SD only considers 3 of those.

Additionally, we released a Colab notebook to test the safety filter on any given image. This can hopefully help the community identify failure modes that might be useful for future filters.

javirandor avatar Oct 11 '22 09:10 javirandor

@javirandor Wow it's quite an effort. Thank you for sharing this information! It will help when I need to design my own filters. (especially useful on the false negative cases!)

Nash2325138 avatar Oct 11 '22 10:10 Nash2325138

BTW the concepts have also been published here: https://github.com/LAION-AI/CLIP-based-NSFW-Detector/blob/main/safety_settings.yml

Sorry we've been a bit late here with repyling. In general, we're open for any PRs to add more descriptions about the safety filter. We've added a description about the safety checker here: https://huggingface.co/CompVis/stable-diffusion-v1-4#safety-module - please open a PR or issue if you'd like to put more information there :-)

patrickvonplaten avatar Oct 11 '22 18:10 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 05 '22 15:11 github-actions[bot]