diffusers What are the original NSFW concepts used in the safety checker?

It looks like the safety checker uses some NSFW concept embeddings generated from CLIP to filter out unsafe content. I wonder if we can get the original concepts in text instead of in CLIP embeddings?

Because I want to know if those concepts already cover what I want to filter, if not, I can generate my extra NSFW concepts embeddings from CLIP and use them in the safety checker.

Thanks in advance. This is a very very cool project! 😃

Sep 12 '22 10:09 Nash2325138

I'm not sure if this is what you're looking for, but here's how to disable CompVis's stable diffusion safety models.

Sep 12 '22 21:09 TomPham97

@TomPham97 thanks for the your suggestion. I took a look at your link, but I am not searching for how to disable it. What I want instead is to know the original NSFW concepts (in text) they are using in the safety checker. So I can add more concepts to match based on my needs

Sep 13 '22 07:09 Nash2325138

Personally I'd be fine with exposing the names by now but we'd have to sync with Stability AI on this one. Will ask! cc @mmitchellai @yjernite @apolinario here

Sep 13 '22 16:09 patrickvonplaten

Agree this would be good to know; inquiring with StabilityAI/CompVis if it makes sense to share.

Sep 13 '22 18:09 meg-huggingface

Also cc @natolambert here :-)

Sep 17 '22 12:09 patrickvonplaten

Hi! We have just released a paper in which we analyse the filter implementation, reverse-engineer the unsafe concepts, and describe important limitations. You can read it here.

After releasing it, someone pointed out (see here) that the concepts were actually disclosed in an unreferenced repository from LAION. However, they are not exactly equivalent. Note that LAION uses 5 special concepts, but SD only considers 3 of those.

Additionally, we released a Colab notebook to test the safety filter on any given image. This can hopefully help the community identify failure modes that might be useful for future filters.

Oct 11 '22 09:10 javirandor

@javirandor Wow it's quite an effort. Thank you for sharing this information! It will help when I need to design my own filters. (especially useful on the false negative cases!)

Oct 11 '22 10:10 Nash2325138

BTW the concepts have also been published here: https://github.com/LAION-AI/CLIP-based-NSFW-Detector/blob/main/safety_settings.yml

Sorry we've been a bit late here with repyling. In general, we're open for any PRs to add more descriptions about the safety filter. We've added a description about the safety checker here: https://huggingface.co/CompVis/stable-diffusion-v1-4#safety-module - please open a PR or issue if you'd like to put more information there :-)

Oct 11 '22 18:10 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nov 05 '22 15:11 github-actions[bot]

diffusers diffusers copied to clipboard

What are the original NSFW concepts used in the safety checker?

diffusers
diffusers copied to clipboard