mediapipe
mediapipe copied to clipboard
Training with custom data models results very poor.
I have trained custom object detection model with relevant data (~750 annotated image x 3 augmantations) There are 4 choices for pretrained models. And tested 4 of them with different learning rate (default 0.3) But I tried different combinations with 300 -200 epoch learning reate 0.01 to 0.3. However, the results always very very poor. Ap50 is max 0.35 .
Here reproducted result. What am I do wrong. ???
Note : With same data I got ~86% at object detection with EfficientDet pretrained models as desription.
index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=1.95s). Accumulating evaluation results... DONE (t=0.17s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.082 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.318 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.008 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.070 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.118 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.182 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.184 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.167 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Validation loss: [1.0166717767715454, 0.42808738350868225, 0.010559567250311375, 0.9560657143592834] Validation coco metrics: {'AP': 0.08189376, 'AP50': 0.31780246, 'AP75': 0.00782904, 'APs': 0.069966085, 'APm': 0.3512227, 'APl': -1.0, 'ARmax1': 0.11827957, 'ARmax10': 0.18172044, 'ARmax100': 0.18387097, 'ARs': 0.16743295, 'ARm': 0.42222223, 'ARl': -1.0}
I have the same experience. I created a folder with 115 folders with cropped images of dogs and uploaded it to Google Drive. I ran the MediaPipe Model Maker Image Classifier Demo with all defaults except for image_path and spec = image_classifier.SupportedModels (EfficientNet-Lite0). I expected to get a tflite model that is smaller, faster and more accurate when classifying dog images, compared to EfficientNet-Lite0. What I got is indeed smaller but the accuracy is lousy. I suspect this has to do with the metadata created by the model maker. If I display the metadata of the new model, it shows mean 0.0 std 255.0 while EfficientNet-Lite0 has mean 127.0 std 128.0. See below for details.
{
"name": "ImageClassifier",
"description": "Identify the most prominent object in the image from a known set of categories.",
"subgraph_metadata": [
{
"input_tensor_metadata": [
{
"name": "image",
"description": "Input image to be processed.",
"content": {
"content_properties_type": "ImageProperties",
"content_properties": {
"color_space": "RGB"
}
},
"process_units": [
{
"options_type": "NormalizationOptions",
"options": {
"mean": [
0.0
],
"std": [
255.0
]
}
}
],
"stats": {
"max": [
1.0
],
"min": [
0.0
]
}
}
],
"output_tensor_metadata": [
{
"name": "score",
"description": "Score of the labels respectively.",
"content": {
"content_properties_type": "FeatureProperties",
"content_properties": {
}
},
"stats": {
"max": [
1.0
],
"min": [
0.0
]
},
"associated_files": [
{
"name": "labels.txt",
"description": "Labels for categories that the model can recognize.",
"type": "TENSOR_AXIS_LABELS"
}
]
}
]
}
]
}
{
"name": "EfficientNet-lite image classifier",
"description": "Identify the most prominent object in the image from a set of 1,000 categories such as trees, animals, food, vehicles, person etc.",
"version": "1",
"subgraph_metadata": [
{
"input_tensor_metadata": [
{
"name": "image",
"description": "Input image to be classified. The expected image is 224 x 224, with three channels (red, blue, and green) per pixel. Each element in the tensor is a value between min and max, where (per-channel) min is [-0.9921875] and max is [1.0].",
"content": {
"content_properties_type": "ImageProperties",
"content_properties": {
"color_space": "RGB"
}
},
"process_units": [
{
"options_type": "NormalizationOptions",
"options": {
"mean": [
127.0
],
"std": [
128.0
]
}
}
],
"stats": {
"max": [
1.0
],
"min": [
-0.992188
]
}
}
],
"output_tensor_metadata": [
{
"name": "probability",
"description": "Probabilities of the 1000 labels respectively.",
"content": {
"content_properties_type": "FeatureProperties"
},
"stats": {
"max": [
1.0
],
"min": [
0.0
]
},
"associated_files": [
{
"name": "labels_without_background.txt",
"description": "Labels for objects that the model can recognize.",
"type": "TENSOR_AXIS_LABELS"
}
]
}
]
}
],
"author": "MediaPipe",
"license": "Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0."
}