server icon indicating copy to clipboard operation
server copied to clipboard

SSD_MobileNetv1_COCO label_filename incorrect classification

Open jishminor opened this issue 4 years ago • 9 comments

Description When running ssd_mobilenetv1_coco in triton, and specifying the class labels in the model config, it seems that the labels are not assigned correctly.

Triton Information What version of Triton are you using? NVIDIA Release 20.03 (build 11042949)

Are you using the Triton container or did you build it yourself? Using triton container

To Reproduce Steps to reproduce the behavior. My model config for ssd_mobilenetv1_coco is as follows:

name: "ssd_mobilenet_coco"
platform: "tensorflow_graphdef"
max_batch_size: 1
input [
  {
    name: "image_tensor"
    data_type: TYPE_UINT8
    format: FORMAT_NHWC
    dims: [ 300, 300, 3 ]
  }
]
output [
  {
    name: "detection_boxes"
    data_type: TYPE_FP32
    dims: [ 100, 4 ]
  },
  {
    name: "detection_scores"
    data_type: TYPE_FP32
    dims: [ 100 ]
  },
  {
    name: "num_detections"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
  },
  {
    name: "detection_classes"
    data_type: TYPE_FP32
    dims: [ 100 ]
    label_filename: "ssd_mobilenet_coco.classes"
  }
]

A snippet of my labels file ssd_mobilenet_coco.classes:

0 unlabeled
1 person
2 bicycle
3 car

With a properly formatted image I make the request:

result = ctx.run(
          { input_name : (preprocess(img, format, dtype, c, h, w, args.scaling),) },
          { output_names[3] : (InferContext.ResultFormat.CLASS, args.classes),
            output_names[2] : InferContext.ResultFormat.RAW,
            output_names[1] : InferContext.ResultFormat.RAW,
            output_names[0] : InferContext.ResultFormat.RAW })

Which yields the incorrect labels as follows:

{'detection_classes': [[(61, 67.0, '61 cake'), (18, 42.0, '18 dog'), (73, 42.0, '73 laptop'), (49, 42.0, '49 knife'), (36, 41.0, '36 snowboard'), (35, 41.0, '35 skis'), (94, 41.0, '94 branch'), (83, 41.0, '83 blender'), (87, 41.0, '87 scissors'), (24, 41.0, '24 zebra')]], 'num_detections': [array([100.], dtype=float32)]

If I rather make a request like the following, using the RAW format rather than CLASS:

result = ctx.run(
          { input_name : (preprocess(img, format, dtype, c, h, w, args.scaling),) },
          { output_names[3] : InferContext.ResultFormat.RAW,
            output_names[2] : InferContext.ResultFormat.RAW,
            output_names[1] : InferContext.ResultFormat.RAW,
            output_names[0] : InferContext.ResultFormat.RAW })

I get the following output:

'detection_classes': [array([ 1.,  1.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  3.,  1.,  1.,
        1.,  1.,  3.,  1.,  1., 42., 33.,  1., 10.,  1.,  1., 41.,  1.,
       31.,  1.,  1., 31., 10.,  1.,  1., 31.,  1., 41., 41.,  8., 10.,
        1.,  1.,  3.,  1., 10., 10.,  8., 10., 41., 31., 42.,  1.,  1.,
       31.,  1.,  1.,  1., 10.,  1.,  1.,  3.,  1., 67.,  1., 31.,  3.,
        1.,  1.,  1.,  1.,  1.,  1.,  3., 10., 42., 31., 31.,  1.,  1.,
        8.,  1.,  1., 31.,  1., 41., 31., 10., 31., 41., 33., 31., 31.,
        1.,  1.,  1., 41.,  1., 31.,  1.,  1.,  1.], dtype=float32)],

Which clearly should map to mostly people and cars.

Expected behavior I would expect the indexes in the detection classes tensor above to be mapped to my classes file appropriately. I'm not sure how the list with cake, dog, etc is being generated.

jishminor avatar May 20 '20 23:05 jishminor

It seems that your output tensor is class index values, not probabilities. Classification output assumes that the output tensor contains probabilities (specifically the probability for each class). Perhaps you can just use the "detection_scores" output.

deadeyegoodwin avatar May 21 '20 00:05 deadeyegoodwin

Ah I see. I believe the structure of my output tensors for detection probabilities doesn't lend itself well to this assumption. To provide more context on the structure of the output for this model this is again the raw output with more info:

{'detection_classes': [array([ 1.,  1.,  3.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  3.,  1.,  1.,
        1.,  1.,  3.,  1.,  1., 42., 33.,  1., 10.,  1.,  1., 41.,  1.,
       31.,  1.,  1., 31., 10.,  1.,  1., 31.,  1., 41., 41.,  8., 10.,
        1.,  1.,  3.,  1., 10., 10.,  8., 10., 41., 31., 42.,  1.,  1.,
       31.,  1.,  1.,  1., 10.,  1.,  1.,  3.,  1., 67.,  1., 31.,  3.,
        1.,  1.,  1.,  1.,  1.,  1.,  3., 10., 42., 31., 31.,  1.,  1.,
        8.,  1.,  1., 31.,  1., 41., 31., 10., 31., 41., 33., 31., 31.,
        1.,  1.,  1., 41.,  1., 31.,  1.,  1.,  1.], dtype=float32)], 'num_detections': [array([100.], dtype=float32)], 'detection_scores': [array([0.9095352 , 0.89552975, 0.75260067, 0.52773774, 0.30497238,
       0.29011738, 0.24955702, 0.2438381 , 0.236559  , 0.23035944,
       0.18808621, 0.18012342, 0.17272481, 0.17168066, 0.15367392,
       0.15107837, 0.14348134, 0.14185196, 0.14069438, 0.13835889,
       0.13705641, 0.13419703, 0.13205764, 0.12723073, 0.11887994,
       0.11625072, 0.11605057, 0.11500838, 0.11207467, 0.11021665,
       0.10881671, 0.1082496 , 0.10469756, 0.10395947, 0.10349023,
       0.10213143, 0.10147697, 0.10099989, 0.10094565, 0.09988615,
       0.09956372, 0.09953037, 0.09721223, 0.09489211, 0.09449688,
       0.09437826, 0.0943217 , 0.09420252, 0.09341052, 0.09340391,
       0.09241036, 0.09239888, 0.09155831, 0.09113276, 0.09085006,
       0.09084514, 0.09074256, 0.09048536, 0.09009328, 0.0888491 ,
       0.08820841, 0.08755994, 0.0874126 , 0.08723843, 0.08680698,
       0.0850628 , 0.08475658, 0.08427727, 0.08266112, 0.08161193,
       0.08024412, 0.07962763, 0.0794287 , 0.07941556, 0.07910559,
       0.07892475, 0.07748681, 0.07730368, 0.07728311, 0.07714537,
       0.07695287, 0.07661402, 0.07650983, 0.0760279 , 0.07576445,
       0.07557225, 0.07451567, 0.07441509, 0.07392779, 0.07374603,
       0.07359245, 0.0735119 , 0.07329229, 0.07290748, 0.07279301,
       0.07271665, 0.07256082, 0.07255065, 0.07222772, 0.07210785],
      dtype=float32)]

The items in each tensor are ordered from most confident to least. Ie detection_class[0] = 1(car) has detection_prob[0] = 0.9. Unless I am missing something from the documentation, it seems like the label_filename feature can't be used in this scenario.

jishminor avatar May 21 '20 01:05 jishminor

If there is an earlier point in your model where you have not yet sorted the items by confidence, and instead there is a tensor where each entry is a probability/confidence for the corresponding index then you could expose that tensor as an output and use label_filename... otherwise I agree that it doesn't really work... at least not with the options that Triton has right now. We could add a classification option that treats the tensor as already sorted indices and then you could use that... so I'll mark this as an enhancement.

deadeyegoodwin avatar May 21 '20 19:05 deadeyegoodwin

Sounds good. I'm using the pre-trained ssd_mobilenetv1_coco model, which out of the box outputs tensors in this fashion. I think this would be a good enhancement, as I presume object detection will be a pretty common task for users of triton.

jishminor avatar May 21 '20 20:05 jishminor

Sounds good. I'm using the pre-trained ssd_mobilenetv1_coco model, which out of the box outputs tensors in this fashion. I think this would be a good enhancement, as I presume object detection will be a pretty common task for users of triton.

I agree. I'm running yolov5 object detection and post-processing on nvcr.io/nvidia/tritonserver:21.12-py3 and my output is:

output [
  {
    name: "boxes__0"
    data_type: TYPE_FP32
    dims: [ -1, 4 ]
  },
  {
    name: "scores__1"
    data_type: TYPE_FP32
    dims: [ -1 ]
  },
  {
    name: "labels__2"
    data_type: TYPE_FP32
    dims: [ -1, 6 ]
    label_filename: "labels.txt"
  }
]

where -1 corresponds to a number of detections and max-batch-size is either 0 or 1.

However when I request class_count > 0 from client it apparently ravels all values in labels__2 and treats it as one prediction. Hence, I get a np array of shape (1, class_count) as if for one detection.

It would nice to only treat the last dimension as class probabilities and return labels for each one in previous dimensions.

Rusteam avatar Jan 10 '22 07:01 Rusteam

Hello @deadeyegoodwin and @jishminor!

Is the label_map problem solved? Like is there any option now that can handle object detection output like in Josh case?

mhbassel avatar Aug 09 '22 11:08 mhbassel

I've created a triton model with an identity matrix and label filename just to handle that scenario.

Rusteam avatar Aug 09 '22 12:08 Rusteam

Thanks @Rusteam how did you do it exactly, please?

In my case I have TF object detection model, and Triton is treating its output class Tensor as a probability values and map their indexes to the the label_map, which is wrong, because the output tensor shape (100 in my case) is independent of the number of classes, and those values are already indexes w.r.t their label_map. Triton should take the values as they are and map them.

mhbassel avatar Aug 09 '22 13:08 mhbassel

Thanks @Rusteam how did you do it exactly, please?

In my case I have TF object detection model, and Triton is treating its output class Tensor as a probability values and map their indexes to the the label_map, which is wrong, because the output tensor shape (100 in my case) is independent of the number of classes, and those values are already indexes w.r.t their label_map. Triton should take the values as they are and map them.

I have an ensemble model that connects preprocessing, yolov5 model and postprocessing. Additionally, I sent class probs from post processing to an identity matrix that has a labels file.

Rusteam avatar Aug 10 '22 06:08 Rusteam