nsfw_model icon indicating copy to clipboard operation
nsfw_model copied to clipboard

Unreproducable: accuracy for porn class in nsfw_data_scrapper dataset

Open jiangyurong609 opened this issue 6 years ago • 7 comments

I tried to test the model against all the data from https://github.com/alexkimxyz/nsfw_data_scraper. It turns out the the model accuracy was pretty low (0.84) for porn classes. I checked several false negative images, should belong to porn. Any suggestions? Thanks

Below are detailed results for porn images:

porn 0.8378566785677277 sexy 0.09817904345614349 neutral 0.028440081768767722 hentai 0.033282149350465834 drawings 0.0022420468568952363

jiangyurong609 avatar Apr 29 '19 04:04 jiangyurong609

I haven't had such poor results. How clean/sure are you of your data?

Also which model did you use? Keras or TF?

I do not have such poor results on my machine, so I'm trying to gather the details.

GantMan avatar Apr 29 '19 04:04 GantMan

I used this keras model https://s3.amazonaws.com/nsfwdetector/nsfw.299x299.h5. I basically ran the scripts in https://github.com/alexkimxyz/nsfw_data_scraper to generate the train dataset, and used the train dataset to test it. There are 106153 porn images tested with the model. What's your suggestions around this test? Thanks

jiangyurong609 avatar Apr 29 '19 05:04 jiangyurong609

Wow. That result sucks! It sounds like I need to re-run the scraper and get some more data and see if I can retrain the model. It's doing amazing with my local dataset. Which was pulled not long ago.

Feel free to see if training on your dataset improves performance.

GantMan avatar May 01 '19 15:05 GantMan

@GantMan thanks, I will try to do it when I have time. Meanwhile, if you had chance to re-train, please let me know. Thanks

jiangyurong609 avatar May 01 '19 15:05 jiangyurong609

Also, if you run the "self clense" script can you let me know if you have a lot of errors in your dataset?

Mine was pretty clean. From what I recall, a fresh pull on NSFW data scraper can be pretty off.

GantMan avatar May 01 '19 15:05 GantMan

@TechnikEmpire what's the advantage of SSD for this simple classification?

jiangyurong609 avatar May 18 '19 05:05 jiangyurong609

Hrmmm. That came off kinda harsh. This project is about everything and everyone. Would you like to try again? I think this is a great chance for you to practice sharing your research in a friendly yet challenging way.

GantMan avatar May 18 '19 13:05 GantMan