SW-CV-ModelZoo icon indicating copy to clipboard operation
SW-CV-ModelZoo copied to clipboard

What is the role of the Python files in the SFW_cleanup folder?

Open syna-nalinear opened this issue 1 year ago • 1 comments

image

I would like to inquire about the SFW_cleanup folder and whether its contents would be helpful in identifying which tags correspond to NSFW.

Specifically, I am curious as to what function the SFW_cleanup folder serves and what kind of content is included in the two files found in the capture, namely danboorufiles.txt and purged.csv.

Thank you for your time and assistance.

syna-nalinear avatar Apr 18 '23 02:04 syna-nalinear

The name is mostly legacy. Back when I was using the 512px SFW subset of Danbooru 2021 by Gwern those script were used to iteratively trim the tag and images list. danboorufiles.txt was the list of downloaded images, and purged.csv was the full list of tags with counts. The reasoning was that NSFW tags would have less images downloaded than expected (generally 0 or very close) so they had to be removed due to having too little, and very likely misleading, examples.

Today you can probably use it in a similar way if you remove image IDs associated with rating:explicit (and possibly rating:questionable) from a list of all danbooru image IDs.

The danbooru sqlite DB used for image data lookup was generated using fire-eggs' scripts: https://github.com/fire-eggs/Danbooru2021

SmilingWolf avatar Aug 12 '23 10:08 SmilingWolf