age-gender-estimation
age-gender-estimation copied to clipboard
Age on dataset imdb and wiki
Hello, I found strange things on dataset maximum age is 2014 and minimum age is -31. I tried using your code in https://github.com/yu4u/age-gender-estimation/blob/master/create_db.py in line 34 and I add several code
mat_path = "../imdb.mat"
db = "imdb"
full_path, dob, gender, photo_taken, face_score, second_face_score, age\
= get_meta(mat_path, db)
path_list = [str(item[0]) for item in full_path.tolist()]
max_age = np.max(age) #result 2014
max_age_idx = np.argmax(age) #result 181492
path_photo = path_list[max_age_idx] #result '64/nm1002664_rm1109196544_0-7-31_2015.jpg'
and link dataset I download is https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_crop.tar
As described in the website, the dataset was automatically created via crawling. The ages in the dataset are estimated values calculated from dob
(date of birth) and photo_taken
(Exif). Therefore, the resulting metadata may be not so accurate.
Of course, we can not vouch for the accuracy of the assigned age information. Besides wrong timestamps, many images are stills from movies - movies that can have extended production times
so, If I want to train image with exact age, I must do manual filter, right ?
Yes, as it is impossible to know exact ages, what we can do is filter out unreliable ages manually as you mentioned or automatically (but not perfect) as done in create_db.py
.
You can use this dataset for pre-train, and then fine-tune the model using more reliable (but smaller) dataset (I had not tried it).
http://chalearnlap.cvc.uab.es/dataset/19/description/
For training age estimation are you using categorical cross entropy or regression ?
Cross entropy is used as proposed in the original paper:
predictions_a = Dense(units=101, kernel_initializer=self._weight_init, use_bias=self._use_bias,
kernel_regularizer=l2(self._weight_decay), activation="softmax")(flatten)
Interesting discussion, but have you guys manually inspected the data? By doing so I see most of the photos make no sense for the age they are set to. For me it is impossible to the network to learn something reliable from such data. Maybe I am missing something here, but for example, none of the images I see under 10 years old actually belongs to a under-than-10 year old person on the IMDB dataset.
You are right. The labels in the IMDB-WIKI dataset are noisy because it is was automatically created from web sites. It is originally created for the purpose of pre-training; training on cleaner datasets such as the APPA-REAL Dataset (https://github.com/yu4u/age-gender-estimation/tree/master/appa-real) is assumed.
I see. Good to know about the APPA-REAL dataset, I did not knew it. Thanks for the info!
Thanks for the repo and all the effort! I get that you have below 4 MAE with other datasets, but I am curious what MAE score you get for training/validation on the noisy IMDB-WIKI with this implementation. In my own implementation, when training on 14k filtered images from Wiki, with balanced distribution and batches, no augmentation, with VGG-16 and regression it converges to 5.2 MAE (validation), with AlexNet (used for multi-task purposes with another network) and regression - 5.6, and with AlexNet classification - 5.8. For my use, having the lowest score is not important but still, it seems alarmingly high and I would like to compare if you have the numbers. Thanks!
As I don't know the difference between your implementation and this project, I can't say for certain, but; 1) the imdb dataset is relatively better (cleaner) than the wiki dataset, 2) classification+expectation is better than regression if you are solving age estimation as regression problem.
Cross entropy is used as proposed in the original paper:
predictions_a = Dense(units=101, kernel_initializer=self._weight_init, use_bias=self._use_bias, kernel_regularizer=l2(self._weight_decay), activation="softmax")(flatten)
@yu4u Why dont you compute the loss between the label and softmax expected value like in the demo.py:
results = model.predict(sub_test_images)
predicted_genders = results[0]
ages = np.arange(0, 101).reshape(101, 1)
predicted_ages = results[1].dot(ages).flatten()