mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Training with custom data models results very poor.

Open cappittall opened this issue 2 years ago • 1 comments

I have trained custom object detection model with relevant data (~750 annotated image x 3 augmantations) There are 4 choices for pretrained models. And tested 4 of them with different learning rate (default 0.3) But I tried different combinations with 300 -200 epoch learning reate 0.01 to 0.3. However, the results always very very poor. Ap50 is max 0.35 .

Here reproducted result. What am I do wrong. ???

Note : With same data I got ~86% at object detection with EfficientDet pretrained models as desription.

index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=1.95s). Accumulating evaluation results... DONE (t=0.17s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.082 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.318 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.008 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.070 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.118 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.182 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.184 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.167 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.422 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Validation loss: [1.0166717767715454, 0.42808738350868225, 0.010559567250311375, 0.9560657143592834] Validation coco metrics: {'AP': 0.08189376, 'AP50': 0.31780246, 'AP75': 0.00782904, 'APs': 0.069966085, 'APm': 0.3512227, 'APl': -1.0, 'ARmax1': 0.11827957, 'ARmax10': 0.18172044, 'ARmax100': 0.18387097, 'ARs': 0.16743295, 'ARm': 0.42222223, 'ARl': -1.0}

cappittall avatar Oct 05 '23 05:10 cappittall

I have the same experience. I created a folder with 115 folders with cropped images of dogs and uploaded it to Google Drive. I ran the MediaPipe Model Maker Image Classifier Demo with all defaults except for image_path and spec = image_classifier.SupportedModels (EfficientNet-Lite0). I expected to get a tflite model that is smaller, faster and more accurate when classifying dog images, compared to EfficientNet-Lite0. What I got is indeed smaller but the accuracy is lousy. I suspect this has to do with the metadata created by the model maker. If I display the metadata of the new model, it shows mean 0.0 std 255.0 while EfficientNet-Lite0 has mean 127.0 std 128.0. See below for details.

{
  "name": "ImageClassifier",
  "description": "Identify the most prominent object in the image from a known set of categories.",
  "subgraph_metadata": [
    {
      "input_tensor_metadata": [
        {
          "name": "image",
          "description": "Input image to be processed.",
          "content": {
            "content_properties_type": "ImageProperties",
            "content_properties": {
              "color_space": "RGB"
            }
          },
          "process_units": [
            {
              "options_type": "NormalizationOptions",
              "options": {
                "mean": [
                  0.0
                ],
                "std": [
                  255.0
                ]
              }
            }
          ],
          "stats": {
            "max": [
              1.0
            ],
            "min": [
              0.0
            ]
          }
        }
      ],
      "output_tensor_metadata": [
        {
          "name": "score",
          "description": "Score of the labels respectively.",
          "content": {
            "content_properties_type": "FeatureProperties",
            "content_properties": {
            }
          },
          "stats": {
            "max": [
              1.0
            ],
            "min": [
              0.0
            ]
          },
          "associated_files": [
            {
              "name": "labels.txt",
              "description": "Labels for categories that the model can recognize.",
              "type": "TENSOR_AXIS_LABELS"
            }
          ]
        }
      ]
    }
  ]
}

{
  "name": "EfficientNet-lite image classifier",
  "description": "Identify the most prominent object in the image from a set of 1,000 categories such as trees, animals, food, vehicles, person etc.",
  "version": "1",
  "subgraph_metadata": [
    {
      "input_tensor_metadata": [
        {
          "name": "image",
          "description": "Input image to be classified. The expected image is 224 x 224, with three channels (red, blue, and green) per pixel. Each element in the tensor is a value between min and max, where (per-channel) min is [-0.9921875] and max is [1.0].",
          "content": {
            "content_properties_type": "ImageProperties",
            "content_properties": {
              "color_space": "RGB"
            }
          },
          "process_units": [
            {
              "options_type": "NormalizationOptions",
              "options": {
                "mean": [
                  127.0
                ],
                "std": [
                  128.0
                ]
              }
            }
          ],
          "stats": {
            "max": [
              1.0
            ],
            "min": [
              -0.992188
            ]
          }
        }
      ],
      "output_tensor_metadata": [
        {
          "name": "probability",
          "description": "Probabilities of the 1000 labels respectively.",
          "content": {
            "content_properties_type": "FeatureProperties"
          },
          "stats": {
            "max": [
              1.0
            ],
            "min": [
              0.0
            ]
          },
          "associated_files": [
            {
              "name": "labels_without_background.txt",
              "description": "Labels for objects that the model can recognize.",
              "type": "TENSOR_AXIS_LABELS"
            }
          ]
        }
      ]
    }
  ],
  "author": "MediaPipe",
  "license": "Apache License. Version 2.0 http://www.apache.org/licenses/LICENSE-2.0."
}

BoHellgren avatar Jan 26 '24 15:01 BoHellgren