ultralytics icon indicating copy to clipboard operation
ultralytics copied to clipboard

ONNX output V5 vs V8

Open knoppmyth opened this issue 2 years ago • 1 comments

Search before asking

  • [X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

Will a future release change the ONNX output to match the output of YOLOV5? My current code (using OpenCV's DNN), works for YOLOV5 - V7. Obviously having one method for reading ONNX would be better than maintaining two. Given V8 is in it's early days, will a change be made so the ONNX output is like in V5? Or is this feature complete?

Thanks for the work and commitment to open source.

Additional

No response

knoppmyth avatar Jan 17 '23 22:01 knoppmyth

Similar to #350

DavidBerschauer avatar Jan 19 '23 14:01 DavidBerschauer

Here is the code I've used to read both v5 and 8 ONNX output with OpenCV:

    def detect_yolo_onnx(confidenc, image, input_width, input_height, class_list, model, model_path, model_details):
    INPUT_WIDTH = input_width
    INPUT_HEIGHT = input_height
    class_list = class_list
    net = cv2.dnn.readNet(model_path)
    row, col, _ = image.shape
    _max = max(col, row)
    result = np.zeros((_max, _max, 3), np.uint8)
    result[0:row, 0:col] = image
    detection_start = time.perf_counter()
    blob = cv2.dnn.blobFromImage(result, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
    net.setInput(blob)
    preds = net.forward()
    detection_finished = time.perf_counter()
    if model == "yolov8":
        preds = preds.transpose((0, 2, 1))
    class_ids = []
    confidences = []
    boxes = []
    rows = preds[0].shape[0]
    image_width, image_height, _ = result.shape
    x_factor = image_width / INPUT_WIDTH
    y_factor =  image_height / INPUT_HEIGHT
    list_of_conf = []

    for r in range(rows):
        row = preds[0][r]
        confidence = row[4]
        list_of_conf.append(confidence)
        if confidence >= confidenc:
            if model == "yolov8":
                classes_scores = row[4:]
            else:
                classes_scores = row[5:]
            _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
            class_id = max_indx[1]
            if (classes_scores[class_id] > .25):
                confidences.append(confidence)
                class_ids.append(class_id)
                x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item() 
                left = int((x - 0.5 * w) * x_factor)
                top = int((y - 0.5 * h) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)
                box = np.array([left, top, width, height])
                boxes.append(box)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45) 
    result_class_ids = []
    result_confidences = []
    result_boxes = []
    for i in indexes:
        result_confidences.append(confidences[i])
        result_class_ids.append(class_ids[i])
        result_boxes.append(boxes[i])
    
    colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]
    for (classid, confidence, box) in zip(result_class_ids, result_confidences, result_boxes):
         color = colors[int(classid) % len(colors)]
         cv2.rectangle(image, box, color, 5)
         cv2.rectangle(image, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
         cv2.putText(image, class_list[classid], (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,0))
    yolo_onnx_returned = []
    yolo_onnx_returned.append(model_details)
    if len(result_boxes) == 0:
        yolo_onnx_returned.append(f"Detection attempt took {round(detection_finished - detection_start,2)} seconds. No tattos detected. Try setting confindence level to {(round(np.amax(list_of_conf) * 100) -1)} or lower.")
    else:
        yolo_onnx_returned.append(f"Model detected {len(result_boxes)} tattoos in {round(detection_finished - detection_start,2)} seconds with a confidence level of {confidenc * 100} percent.")
    yolo_onnx_returned.append(image)
    models_returned.append(yolo_onnx_returned)
    return models_returned

Not sure why the code isn't being posted cleanly...

knoppmyth avatar Jan 19 '23 23:01 knoppmyth

models_returned

models_returned is not defined in the method

BrianP8701 avatar Feb 01 '23 04:02 BrianP8701

Also, I'm curious (I don't know if this is a dumb question), how did you figure out how to deal with the models output? Can you reference me to the documentation or whatever you used to help you figure out how to write this method?

BrianP8701 avatar Feb 01 '23 04:02 BrianP8701

@BrianP8701 You can removed models_returned. I included the entry method for completion. I wrote is as part of a Streamlit app so I can compare models against once another. The part you really need to key in one is the if model == "yolov8":. I was able to resolve the issue by reading various posts on the here about ONNX.

knoppmyth avatar Feb 01 '23 04:02 knoppmyth

@knoppmyth Hi, i'm also trying to work with opencv and yolov8, i see your code for the fix isn't being formatted in the correct way, do you have any other way to share this? I tried using net = cv2.dnn.readNet("myyolov8model")and i got this error:

OpenCV(4.7.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1073: error: (-2:Unspecified error) in function 'handleNode' Node [[email protected]]:(onnx_node!/model.22/Split) parse error: OpenCV(4.7.0) /io/opencv/modules/dnn/src/layers/slice_layer.cpp:274: error: (-215:Assertion failed) splits > 0 && inpShape[axis_rw] % splits == 0 in function 'getMemoryShapes'

aaSchcolnik avatar Feb 09 '23 23:02 aaSchcolnik

@aaSchcolnik Hello. Yeah, I used the 'code' button to paste the code and no matter what, it wouldn't format it right... What opset did you use when exporting the model to ONNX? I used '11' as the default of '17' (I believe 17 was/is the default) doesn't work with OpenCV. I believe newer version of v8, you have to use '12' but I've not upgraded in several releases.

knoppmyth avatar Feb 09 '23 23:02 knoppmyth

@knoppmyth after reading another one of your comments in another issue i used opset=11 and it seems to have worked now, this is the first time I'm working with openCV to try and use it for detection so at least the first step (reading the ONNX) seems to have worked, if you have any help for me regarding how to get this to work on a webcam setup i'd appreciate it

aaSchcolnik avatar Feb 09 '23 23:02 aaSchcolnik

@aaSchcolnik Cool. I've not attempted to use a webcam as it currently isn't in my use case. However, if I were to, I'd make a function that does the detection. Then in the while true, I'd call that function with the current frame as an input. You should be able to find something on YouTube by search for "yolo webcam".

knoppmyth avatar Feb 09 '23 23:02 knoppmyth

hi@aaSchcolnik I had the same error as you, but I didn't solve it by changing opeset. Maybe it was because of resize. I finally passed the reading model in the first step, but I met another error.Do you know how to solve it? what(): OpenCV(4.7.0) /home/soloplayl/ENV_PACKS/installing/opencv-4.7.0/modules/core/src/out.cpp:87: error: (-215:Assertion failed) m.dims <= 2 in function 'FormattedImpl'

soloplayl avatar Mar 06 '23 16:03 soloplayl

@knoppmyth could you please update your snippet, it should look like this, make sure there is a newline after python

```python
def some_code():
    here
```

wereii avatar Mar 26 '23 01:03 wereii

@wereii Done.

knoppmyth avatar Mar 26 '23 03:03 knoppmyth

Hi, using an exported ONNX model for yolov8s.pt. I am using cv2 ONNX, when I use CUDA backend and target, x,y,w,h are always 0. If I use CPU it works as expected.

GPU output:

  '08/01/23 16:26:05.0014' ZoMi:API[541830] DEBUG onnx:208 -> OpenCV:ONNX:yolov8s onnx: preds TYPE = <class 'tuple'> ---- preds = (array(
  [[[          0,           0,           0, ...,           0,           0,           0],
          [          0,           0,           0, ...,           0,           0,           0],
          [          0,           0,           0, ...,           0,           0,           0],
          ...,
          [ 9.3641e-07,  8.2604e-07,  2.7489e-07, ...,  2.4086e-06,  2.1599e-06,  2.5008e-06],
          [ 3.1226e-07,  2.0598e-07,  9.3705e-08, ...,  1.5714e-06,  1.5165e-06,  1.4736e-06],
          [  7.242e-07,   3.412e-07,  1.5654e-07, ...,  2.1359e-06,  2.2234e-06,  2.2077e-06]]], dtype=float32),)

  '08/01/23 16:26:05.0129' ZoMi:API[541830] DEBUG onnx:248 -> OpenCV:ONNX:yolov8s onnx: label = 'diningtable' - 0.0, 0.0, 0.0. 0.0

CPU

  '08/01/23 16:27:13.0884' ZoMi:API[542063] DEBUG onnx:208 -> OpenCV:ONNX:yolov8s onnx: preds TYPE = <class 'tuple'> ---- preds = (array(
  [[[     6.3594,      9.4768,      21.329, ...,      556.42,      583.57,      588.77],
          [     9.4905,      14.451,      15.569, ...,      622.46,      599.02,      597.79],
          [     12.641,      18.602,      43.387, ...,      171.68,      126.05,      123.96],
          ...,
          [ 9.3641e-07,  8.2604e-07,  2.7489e-07, ...,  2.4086e-06,  2.1599e-06,  2.5009e-06],
          [ 3.1226e-07,  2.0598e-07,  9.3705e-08, ...,  1.5714e-06,  1.5165e-06,  1.4736e-06],
          [  7.242e-07,   3.412e-07,  1.5654e-07, ...,  2.1359e-06,  2.2234e-06,  2.2077e-06]]], dtype=float32),)

  '08/01/23 16:27:13.0995' ZoMi:API[542063] DEBUG onnx:248 -> OpenCV:ONNX:yolov8s onnx: label = 'diningtable' - 461.19976806640625, 451.9
  26025390625, 70.67080688476562. 112.97637939453125

I dont understand what could be causing this?

baudneo avatar Aug 01 '23 22:08 baudneo

@baudneo thank you for using YOLOv8 and reaching out to us.

This kind of discrepancy between CPU and GPU output typically originates from differences in the models or operations supported by OpenCV's DNN module between CPU and CUDA backend and target.

In the case of YOLOv8, the underlying issue most likely lies in the CUDA implementation of OpenCV's DNN module which may not be handling some operations used in our model correctly or as efficiently as its CPU counterpart.

Unfortunately, we cannot control how OpenCV's DNN processes our model, but we can suggest some steps that you might find useful. Firstly, ensure that you're using the latest versions of OpenCV and CUDA, and the optimal settings specifically for your system and use case. Another approach would be to look into potential model optimizations within your application, such as pruning or quantization, which may help align the output of the CPU and GPU.

Also, it would be beneficial to check if others in the OpenCV community have encountered similar issues and if they've found any workarounds or solutions.

Remember that while YOLOv8 is designed to be versatile and mobile, some discrepancies might arise in specific use-cases or environments like the one you're working with.

We appreciate your patience as we continue making advancements into the YOLOv8 model, and we hope you keep following our updates.

glenn-jocher avatar Aug 02 '23 10:08 glenn-jocher

That's fair, I have since moved yolov8 and yolo-nas to onnxruntime. The output of <InferenceSession> session.run() is fed into this function:

    def process_output(self, output):
        if output is not None:
            num_outputs = len(output)
            # Non NAS
            if num_outputs == 1:
                predictions = np.squeeze(output[0]).T
                scores = np.max(predictions[:, 4:], axis=1)

                if len(scores) == 0:
                    return np.empty((0, 4)), np.empty((0,)), np.empty((0,))
                boxes = self.extract_boxes(predictions)
                class_ids = np.argmax(predictions[:, 4:], axis=1)

            # NAS
            elif num_outputs == 2:
                boxes: np.ndarray
                raw_scores: np.ndarray
                boxes, raw_scores = output  # get boxes and scores from outputs
                scores = raw_scores.max(axis=2).flatten()
                if len(scores) == 0:
                    return np.empty((0, 4)), np.empty((0,)), np.empty((0,))

                boxes = np.squeeze(boxes, 0) 
                boxes = self.rescale_boxes(boxes)
                class_ids = np.argmax(raw_scores, axis=2).flatten()
        else:
            return np.empty((0, 4)), np.empty((0,)), np.empty((0,))

        indices = cv2.dnn.NMSBoxes(
            boxes, scores, self.options.confidence, self.options.nms
        )
        return boxes[indices].astype(np.int32).tolist(), scores[indices].astype(np.float32).tolist(), class_ids[
            indices].astype(np.int32).tolist()

baudneo avatar Aug 06 '23 15:08 baudneo

@baudneo it looks like you've already implemented a method to process the output of the YOLOv8 model using onnxruntime.

Your current function seems to account for two different output scenarios: one for non-NAS models and another for NAS models. It extracts the bounding box predictions, scores, and class IDs directly from the model output.

As part of the post-processing, it uses the scores to filter out predictions using confidence thresholding and non-maximum suppression (NMS) via the cv2.dnn.NMSBoxes function, which is a common practice for object detection models like YOLOv8. Also, the function seems to handle the case where there are no object detections by returning empty lists.

Keep in mind that testing and fine-tuning your model on your specific use case can yield better results. Given the complexity and diversity of real-world inputs, having other post-processing steps like additional filters or heuristics might be useful.

glenn-jocher avatar Aug 07 '23 15:08 glenn-jocher