nsfw_model icon indicating copy to clipboard operation
nsfw_model copied to clipboard

How many images in the training set for each class?

Open misterDDF opened this issue 6 years ago • 6 comments

Hi,thanks for sharing the code and model, it helps me a lot Can you tell me how many images in the training set for the 5 classes, I'm not familiar about keras, does the code below means that only 500*batch_size images are trained every epoch, and not every images in the training set of nsfw_data_scraper is trained in an epoch? image

btw,I test the model on the test set(2000 images for each class), the correct rate of class neutral is 0.1 lower than the result confusion matrix shows while other classes performs well

misterDDF avatar Mar 04 '19 16:03 misterDDF

Hi!

You are correct. Training the entire dataset would be most impressive as I currently have around 30,000+ images per class. Additionally, I've increased the batch size to 32, which means 16,000 images are pulled in each epoch. Since I'm batching and using Stochastic Gradient Descent, I've found this to be a powerful method for continuous refinement of the model without overfitting.

Additionally, I have perturbation on the images, so that noise, rotation, and cropping is added randomly. Making it mathematically infeasible that the same exact image would ever be used twice.

After some serious re-training/refining I'd love for you to re-test my latest model. I'm getting around 93% accuracy. This was trained longer on an even larger dataset.

Side note:

You say you're not familiar with Keras, if you use some other method, I'd love for you to contribute. I'm planning on writing a Tensorflow JS training version. It would be entertaining to see which ML framework performs best.

GantMan avatar Mar 05 '19 18:03 GantMan

Thanks for your reply.

Yes I'm trying to reimplement this model with Pytorch, but the model accuracy for now can only reach about 83%, thinks I should retrain it more seriously.

misterDDF avatar Mar 06 '19 09:03 misterDDF

Here's a blog post I'm working on for how I trained the model: https://medium.com/@gantlaborde/howto-ai-nsfw-detection-229a9725829c

GantMan avatar Mar 12 '19 14:03 GantMan

Hi! I retrained this model with keras, but the model accuracy for now can only reach 89%. I guess it might be something wrong with my dataset, I can not get enough data for sexy class and drawings class, where did you get data of these two class.

devinhee avatar Jul 04 '19 02:07 devinhee

@devinhee - what's your data categorization error rate at? If you did a basic pull off of reddit etc. You might have some significant misclassifications that are holding your model back.

GantMan avatar Jul 04 '19 16:07 GantMan

@devinhee - what's your data categorization error rate at? If you did a basic pull off of reddit etc. You might have some significant misclassifications that are holding your model back.

Categorization error rate is 20% ~ 25%. Actually, I did some basic data cleaning, deleted bad images, removed duplicate images. But I did not check every single image of each categorization.

devinhee avatar Jul 05 '19 02:07 devinhee