SW-CV-ModelZoo
SW-CV-ModelZoo copied to clipboard
What is the role of the Python files in the SFW_cleanup folder?
I would like to inquire about the SFW_cleanup folder and whether its contents would be helpful in identifying which tags correspond to NSFW.
Specifically, I am curious as to what function the SFW_cleanup folder serves and what kind of content is included in the two files found in the capture, namely danboorufiles.txt and purged.csv.
Thank you for your time and assistance.
The name is mostly legacy. Back when I was using the 512px SFW subset of Danbooru 2021 by Gwern those script were used to iteratively trim the tag and images list.
danboorufiles.txt
was the list of downloaded images, and purged.csv
was the full list of tags with counts.
The reasoning was that NSFW tags would have less images downloaded than expected (generally 0 or very close) so they had to be removed due to having too little, and very likely misleading, examples.
Today you can probably use it in a similar way if you remove image IDs associated with rating:explicit (and possibly rating:questionable) from a list of all danbooru image IDs.
The danbooru sqlite DB used for image data lookup was generated using fire-eggs' scripts: https://github.com/fire-eggs/Danbooru2021