yolov5-face icon indicating copy to clipboard operation
yolov5-face copied to clipboard

求ONNX的正确转换方法

Open beautyrank opened this issue 3 years ago • 12 comments

export.py转换出来的ONNX不是最终结果,求正确转换的方法,我是菜鸟

beautyrank avatar Nov 13 '21 04:11 beautyrank

你所谓的正确的应该是啥样的

derronqi avatar Nov 16 '21 03:11 derronqi

What he probably meant is that the onnx model has only the output of the strides instead of the detection layer output + the strides. Sorry for answering in english, I don't know Chinese and used Google translator. I'm also now understanding how to decode the strides into detection to use the onnx runtime or even openvino as inference engine, if I discover how I post it here.

luisfmnunes avatar Dec 15 '21 13:12 luisfmnunes

You are right, I am not expert of AI, please any of you post the method and convert script here, thanks.

| | beauty_rank | | @.*** | 签名由网易邮箱大师定制 在2021年12月15日 21:19,Luis Felipe de Melo @.***> 写道:

What he probably meant is that the onnx model has only the output of the strides instead of the detection layer output + the strides. Sorry for answering in english, I don't know Chinese and used Google translator. I'm also now understanding how to decode the strides into detection to use the onnx runtime or even openvino as inference engine, if I discover how I post it here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

beautyrank avatar Dec 15 '21 13:12 beautyrank

For follow-up procedure, see https://github.com/deepcam-cn/yolov5-face/blob/0f695a0fad36a2d2299aa3afa4f05eceb344228b/torch2tensorrt/main.py#L94

bobo0810 avatar Dec 16 '21 01:12 bobo0810

Well I also made the post-processing (Detection Layer output) manually with numpy. You can check the function bellow:

  • pred: The inference result from the model
  • stride: the stride array of your model, e.g. [8, 16, 32] for yolov5m
  • image_size: a tuple (h,w) containing the input image shape
def process_anchors(pred, stride, image_size=(800,800)):
    layers = 3
    # anchors = len(ANCHORS[0])//2
    grid = [np.zeros(1)]*layers
    a = np.array(ANCHORS).astype(np.float).reshape(layers,-1,2) # shape (layers, anchors, 2)
    anchor_grid = a.copy().reshape(layers,1,-1,1,1,2) # shape (layers, 1, anchors, 1, 1, 2)
    z = []

    for i in range(layers):
        ny, nx = (dim // stride[i] for dim in image_size)
        bs = pred[i].shape[0] # batch size

        if grid[i].shape[2:4] != pred[i].shape[2:4]:
            grid[i] = make_grid(nx, ny)

        y = np.full_like(pred[i], 0)
        
        class_range = list(range(5)) + list(range(15,15+1))
        y[..., class_range] = sigmoid(pred[i][..., class_range])
        y[..., 5:15] = pred[i][..., 5:15]

        y[...,0:2] = (y[..., 0:2] * 2. - 0.5 + grid[i]) * stride[i] # xy (center)
        y[...,2:4] = (y[..., 2:4] * 2) **2 * anchor_grid[i] # wh

        y[..., 5:7] = y[..., 5:7] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 7:9] = y[..., 7:9] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 9:11] = y[..., 7:9] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 11:13] = y[..., 11:13] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 13:15] = y[..., 13:15] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy


        z.append(y.reshape(bs, -1, 16))

    return z

Also worth mentioning, I updated the export.py module and generated an onnx model with dynamic input, so it receives images with different dimension from 800x800, since letterbox padding won't always produce such result.

luisfmnunes avatar Dec 16 '21 15:12 luisfmnunes

export.py转换出来的ONNX不是最终结果,求正确转换的方法,我是菜鸟

@beautyrank 已支持,请参考https://github.com/deepcam-cn/yolov5-face/pull/108#issue-1083085258

bobo0810 avatar Dec 17 '21 10:12 bobo0810

Hi @luisfmnunes I tried your conversion method and it seems to have broadcasting issue. I am guessing it is because I am not setting ANCHORS properly. Could you share how are you coding those? Best, /M

mlourencoeb avatar Mar 10 '22 15:03 mlourencoeb

Hi @luisfmnunes I tried your conversion method and it seems to have broadcasting issue. I am guessing it is because I am not setting ANCHORS properly. Could you share how are you coding those? Best, /M

@mlourencoeb you can find all my onnx detection python script bellow:

import os
import cv2
import sys
import time
import argparse

import numpy as np
import logging as log
import onnxruntime as ort

sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from .utils.datasets import letterbox
from .utils.general import check_img_size, xywh2xyxy, xyxy2xywh


ANCHORS = [[4,5,  8,10,  13,16],  # P3/8
         [23,29,  43,55,  73,105],  # P4/16
         [146,217,  231,300,  335,433]]  # P5/32

def iou(boxes, scores, iou_thres):
    
    areas = (boxes[:,2] - boxes[:,0] + 1) * (boxes[:,3]-boxes[:,1] + 1) # (x2 - x1) * (y2 - y1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(boxes[i,0], boxes[order[1:],0])
        yy1 = np.maximum(boxes[i,1], boxes[order[1:],1])
        xx2 = np.minimum(boxes[i,2], boxes[order[1:],2])
        yy2 = np.minimum(boxes[i,3], boxes[order[1:],3])
        
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)

        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= iou_thres)[0]
        order = order[inds + 1]

    return keep


def sigmoid(x):
    return 1 / ( 1 + np.exp(-x) )

def make_grid(nx=20, ny=20):
    xv, yv = np.meshgrid(np.arange(nx),np.arange(ny))
    return np.stack((xv,yv),2).reshape(1,1,ny,nx,2).astype(np.float)

def process_anchors(pred, stride, image_size=(800,800)):
    layers = 3
    # anchors = len(ANCHORS[0])//2
    grid = [np.zeros(1)]*layers
    a = np.array(ANCHORS).astype(np.float).reshape(layers,-1,2) # shape (layers, anchors, 2)
    anchor_grid = a.copy().reshape(layers,1,-1,1,1,2) # shape (layers, 1, anchors, 1, 1, 2)
    z = []

    for i in range(layers):
        ny, nx = (dim // stride[i] for dim in image_size)
        bs = pred[i].shape[0] # batch size

        if grid[i].shape[2:4] != pred[i].shape[2:4]:
            grid[i] = make_grid(nx, ny)

        y = np.full_like(pred[i], 0)
        
        class_range = list(range(5)) + list(range(15,15+1))
        y[..., class_range] = sigmoid(pred[i][..., class_range])
        y[..., 5:15] = pred[i][..., 5:15]

        y[...,0:2] = (y[..., 0:2] * 2. - 0.5 + grid[i]) * stride[i] # xy (center)
        y[...,2:4] = (y[..., 2:4] * 2) **2 * anchor_grid[i] # wh

        y[..., 5:7] = y[..., 5:7] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 7:9] = y[..., 7:9] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 9:11] = y[..., 9:11] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 11:13] = y[..., 11:13] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy
        y[..., 13:15] = y[..., 13:15] * anchor_grid[i] + grid[i] * stride[i] # All landmarks xy


        z.append(y.reshape(bs, -1, 16))

    return z

def nms_face(pred, conf_th = 0.3, iou_th = 0.5):

    outputs = [None] * len(pred)
    for xi,x in enumerate(pred):
        x = x[np.where(x[:,4] > conf_th)]
        # print(x)

        min_wh, max_wh = 2, 4096
        x[:, 15:] *= x[:, 4:5]

        box = xywh2xyxy(x[:,:4])

        conf = x[:, 15:].max(1, keepdims=True)
        j = x[:, 15:].argmax(1)
        # print(conf, j)
        x = np.concatenate((box, conf, x[:, 5:15], j.astype(np.float).reshape(-1,1)),1)[np.where(conf.flatten() > conf_th)]

        if not x.shape[0]:
            continue

        c = x[:, 15:16] * max_wh
        boxes, scores = x[:, :4] + c, x[:, 4]
        i = iou(boxes, scores, iou_th)
        # print(i)

        outputs[xi] = x[i]

    return [output for output in outputs if output is not None]

def clip_coords(boxes, img_shape):
    boxes[:, [0,2]].clip(0, img_shape[1])
    boxes[:, [1,3]].clip(0, img_shape[0])
    

def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
    if ratio_pad is None:
        gain = min(img1_shape[0]/img0_shape[0], img1_shape[1]/img0_shape[1])
        pad = (img1_shape[1] - img0_shape[1] * gain)/2, (img1_shape[0] - img0_shape[0] * gain) / 2
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2]] -= pad[0] # x padding
    coords[:, [1, 3]] -= pad[1] # y padding
    coords[:, :4] /= gain # anchor coordinate to pixel coordinate
    clip_coords(coords, img0_shape)
    return coords

def scale_coords_landmarks(img1_shape, coords, img0_shape, ratio_pad=None):
    if ratio_pad is None:
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])
        pad = (img1_shape[1] - img0_shape[1] * gain)/2, (img1_shape[0] - img0_shape[0] * gain) / 2
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2, 4, 6, 8]] -= pad[0]
    coords[:, [1, 3, 5, 7, 9]] -= pad[1]
    coords[:, :10] /=gain

    coords[:,[0, 2, 4, 6, 8]].clip(0, img0_shape[1])
    coords[:,[1, 3, 5, 7, 9]].clip(0, img1_shape[1])
    return coords

def post_process(det, orgimg, img):
    # print(det)
    gn = np.array(orgimg.shape)[[1,0,1,0]]
    gn_lks = np.array(orgimg.shape)[[1,0]*5]
    
    results = []
    for i, d in enumerate(det):
        results += [[]]
        if len(d):
            d[:, :4] = scale_coords(img.shape[2:], d[:, :4], orgimg.shape).round()
            d[:, 5:15] = scale_coords_landmarks(img.shape[2:], d[:, 5:15], orgimg.shape).round()

            for j in range(d.shape[0]):
                xywh = (xyxy2xywh(d[j, :4].reshape(1,4)) / gn).reshape(-1).tolist()
                conf = d[j, 4]
                landmarks = (d[j, 5:15]/gn_lks).reshape(-1).tolist()
                class_num = d[j, 15]
                orgimg = show_results(orgimg, xywh, conf, landmarks, class_num)
                results[i] += [[d[j, :4].reshape(-1).astype(np.int).tolist(), conf, d[j, 5:15].reshape(-1).astype(np.int).tolist()]]
                

    return orgimg, results

def normcenter2point(img_shape, xywh):
    h,w,c = img_shape
    x1 = int(xywh[0] * w - 0.5 * xywh[2] * w)
    y1 = int(xywh[1] * h - 0.5 * xywh[3] * h)
    x2 = int(xywh[0] * w + 0.5 * xywh[2] * w)
    y2 = int(xywh[1] * h + 0.5 * xywh[3] * h)

    return x1,y1,x2,y2

def show_results(img, xywh, conf, landmarks, class_num):
    h,w,c = img.shape
    tl = 1 or round(0.002 * (h + w) / 2) + 1  # line/font thickness
    x1, y1, x2, y2 = normcenter2point(img.shape, xywh)
    cv2.rectangle(img, (x1,y1), (x2, y2), (0,255,0), thickness=tl, lineType=cv2.LINE_AA)

    clors = [(255,0,0),(0,255,0),(0,0,255),(255,255,0),(0,255,255)]

    for i in range(5):
        point_x = int(landmarks[2 * i] * w)
        point_y = int(landmarks[2 * i + 1] * h)
        cv2.circle(img, (point_x, point_y), tl+1, clors[i], -1)

    tf = max(tl - 1, 1)  # font thickness
    label = str(conf)[:5]
    cv2.putText(img, label, (x1, y1 - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
    return img

def detection(im_path, img_size = 800, conf_th = 0.3, iou_th = 0.5, model_path = '/home/luisnunes/Documents/yolov5m-face.onnx', threads = 1):

    if isinstance(img_size,tuple):
        img_size = img_size[0]

    start = time.time()

    sess_opt = ort.SessionOptions()
    sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_opt.inter_op_num_threads = threads
    sess_opt.intra_op_num_threads = threads

    model = ort.InferenceSession(model_path, sess_options=sess_opt)

    inputs = model.get_inputs()
    for i, name in enumerate([input.name for input in inputs]):
        input_name = name

    outputs = model.get_outputs()
    onames = [o.name for o in outputs]

    stride = []
    for i, name in enumerate(onames):
        stride += [int(name.split('_')[-1])]

    max_stride = max(*stride)

    origimg = cv2.imread(im_path)
    if origimg is None:
        return None, None, None, None # CORRIGIR ISSO AQUI DANDO ERRO!

    img0 = origimg.copy()
    h0, w0 = img0.shape[:2]
    r = img_size / max(h0,w0)
    if r != 1:
        interpolation = cv2.INTER_AREA if r < 1 else cv2.INTER_LINEAR
        img0 = cv2.resize(img0, (int(w0 * r), int(h0 * r)), interpolation=interpolation)

    sz = check_img_size(img_size, s=max_stride)

    img = letterbox(img0, new_shape=sz)[0]
    img = img[:, :, ::-1].transpose(2,0,1).copy() #BGR to RGB and HWC to CHW

    img = img.astype(np.float32)
    img /= 255
    img = img[np.newaxis, ...]

    inf_start = time.time()
    pred = model.run(onames, {input_name: img})
    pred = process_anchors(pred, stride, img.shape[2:])
    inf_end = time.time() - inf_start

    # for i,p in enumerate(pred): # Transforms predictions into (bs, GridSize, 16)
    #     p = p.reshape(p.shape[0],-1,16)
    #     pred[i] = p

    concat = np.concatenate(pred, axis=1) # Concatenates all 3 stride layers results for NMS
    det = nms_face(concat, conf_th, iou_th)
    out_img, results  = post_process(det, origimg, img)

    end = time.time() - start

    return out_img, results, inf_end, end

def detect(im_file, img_size = 800, conf_th = 0.3, iou_th = 0.5, model_path = '/home/luisnunes/Documents/yolov5m-face.onnx', threads = 1, **kwargs):

    if isinstance(img_size,tuple):
        img_size = img_size[0]

    start = time.time()

    sess_opt = ort.SessionOptions()
    sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_opt.inter_op_num_threads = threads
    sess_opt.intra_op_num_threads = threads

    model = ort.InferenceSession(model_path, sess_options=sess_opt)

    inputs = model.get_inputs()
    for i, name in enumerate([input.name for input in inputs]):
        input_name = name

    outputs = model.get_outputs()
    onames = [o.name for o in outputs]

    stride = []
    for i, name in enumerate(onames):
        stride += [int(name.split('_')[-1])]

    max_stride = max(*stride)

    origimg = cv2.imread(im_file)
    if origimg is None:
        return None, None

    img0 = origimg.copy()
    h0, w0 = img0.shape[:2]
    r = img_size / max(h0,w0)
    if r != 1:
        interpolation = cv2.INTER_AREA if r < 1 else cv2.INTER_LINEAR
        img0 = cv2.resize(img0, (int(w0 * r), int(h0 * r)), interpolation=interpolation)

    sz = check_img_size(img_size, s=max_stride)

    img = letterbox(img0, new_shape=sz)[0]
    img = img[:, :, ::-1].transpose(2,0,1).copy() #BGR to RGB and HWC to CHW

    img = img.astype(np.float32)
    img /= 255
    img = img[np.newaxis, ...]

    inf_start = time.time()
    pred = model.run(onames, {input_name: img})
    pred = process_anchors(pred, stride, img.shape[2:])
    inf_end = time.time() - inf_start

    # for i,p in enumerate(pred): # Transforms predictions into (bs, GridSize, 16)
    #     p = p.reshape(p.shape[0],-1,16)
    #     pred[i] = p

    concat = np.concatenate(pred, axis=1) # Concatenates all 3 stride layers results for NMS
    det = nms_face(concat, conf_th, iou_th)
    out_img, results  = post_process(det, origimg, img)
    results = results[0]

    end = time.time() - start

    det = []
    landmarks = []
    for result in results:
        det += [result[0]+[result[1]]] # [x1 y1 x2 y2 confidence]
        landmarks += [result[2]]

    return det, landmarks

def main(args):
    stride = []
    log.basicConfig(format="[ %(levelname)s ] %(message)s", level= log.INFO, stream = sys.stdout) 
    start = time.time()

    log.info("Creating ONNX session from model {}".format(args.onnx_model))
    sess_opt = ort.SessionOptions()
    sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_opt.inter_op_num_threads = 1
    sess_opt.intra_op_num_threads = 1

    model = ort.InferenceSession(args.onnx_model, sess_options=sess_opt)

    inputs = model.get_inputs()
    log.info("Model Inputs:")
    for i, name in enumerate([input.name for input in inputs]):
        input_name = name
        print("\t[{}] - {}".format(i,name))

    outputs = model.get_outputs()
    onames = [o.name for o in outputs]
    log.info("Model Outputs:")
    for i, name in enumerate(onames):
        print("\t[{}] - {}".format(i, name))
        stride += [int(name.split('_')[-1])]

    max_stride = max(*stride)
    log.info("Maximum Stride = {}".format(max_stride))

    origimg = cv2.imread(args.image)
    if origimg is None:
        log.error("Cannot load an image from file {}".format(args.image))
        exit(-1)
    
    img0 = origimg.copy()
    h0, w0 = img0.shape[:2]
    r = args.img_size / max(h0,w0)
    if r != 1:
        interpolation = cv2.INTER_AREA if r < 1 else cv2.INTER_LINEAR
        img0 = cv2.resize(img0, (int(w0 * r), int(h0 * r)), interpolation=interpolation)

    sz = check_img_size(args.img_size, s=max_stride)

    img = letterbox(img0, new_shape=sz)[0]
    # cv2.imshow('padded img', img)
    # cv2.waitKey(10000)

    img = img[:, :, ::-1].transpose(2,0,1).copy() #BGR to RGB and HWC to CHW

    img = img.astype(np.float32)
    img /= 255
    img = img[np.newaxis, ...]

    pred = model.run(onames, {input_name: img})
    pred = process_anchors(pred, stride, img.shape[2:])

    for i,p in enumerate(pred):
        p = p.reshape(p.shape[0],-1,16)
        print('[{}]: {}'.format(i, p.shape))
        pred[i] = p
        
    concat = np.concatenate(pred, axis=1) 
    print("Concat Shape:", concat.shape)

    det = nms_face(concat, args.conf_th, args.iou_th)

    out_img = post_process(det, origimg, img)[0]
    log.info("Total elapsed time {:.4f}s".format(time.time()-start))
    cv2.imshow("result",out_img)
    cv2.waitKey(10000)


def parseArguments(argv):
    parser = argparse.ArgumentParser()

    parser.add_argument("onnx_model", type=str, help="Path to the onnx_model")
    parser.add_argument("--image", type=str, help="Path to load an image for post inference")
    parser.add_argument("--img_size", type=int, help="The size of image resizing", default=800)
    parser.add_argument("--conf_th", type=float, help = "Threshold for confidence filtering", default=0.3)
    parser.add_argument("--iou_th", type=float, help="Threshold for NMS intersection over union", default = 0.5)

    return parser.parse_args(argv)


if __name__ == "__main__":
    main(parseArguments(sys.argv[1:]))

luisfmnunes avatar Mar 10 '22 16:03 luisfmnunes

Thanks for sharing @luisfmnunes. Still facing some issues, my models seems to not have the output_"stride" name in the output layer. I just exported with export.py by changing the model name. Any idea?

mlourencoeb avatar Mar 10 '22 17:03 mlourencoeb

Thanks for sharing @luisfmnunes. Still facing some issues, my models seems to not have the output_"stride" name in the output layer. I just exported with export.py by changing the model name. Any idea?

This is on me, my version is a bit behind from the current commit of this repo and I didn't organize it on a fork (I mixed with another repo of mine and removed the git elements for no submodule depedency) and I made some changes to the input and output to respect the standard of my other application operational flow.

I made some tweaks myself to the export script, so basically this is my export.py


"""Exports a YOLOv5 *.pt model to ONNX and TorchScript formats

Usage:
    $ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
"""

import argparse
import sys
import time

sys.path.append('./')  # to run '$ python *.py' files in subdirectories

import torch
import torch.nn as nn

import models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_size
from onnxsim import simplify
import onnx

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='./yolov5s.pt', help='weights path')  # from yolov5/models/
    parser.add_argument('--img_size', nargs='+', type=int, default=[640, 640], help='image size')  # height, width
    parser.add_argument('--batch_size', type=int, default=1, help='batch size')
    parser.add_argument('--onnx2pb', action='store_true', default=False, help='export onnx to pb')
    opt = parser.parse_args()
    opt.img_size *= 2 if len(opt.img_size) == 1 else 1  # expand
    print(opt)
    set_logging()
    t = time.time()

    # Load PyTorch model
    model = attempt_load(opt.weights, map_location=torch.device('cpu'))  # load FP32 model
    model.eval()
    labels = model.names

    # Checks
    gs = int(max(model.stride))  # grid size (max stride)
    opt.img_size = [check_img_size(x, gs) for x in opt.img_size]  # verify img_size are gs-multiples

    # Input
    img = torch.zeros(opt.batch_size, 3, *opt.img_size)  # image size(1,3,320,192) iDetection

    # Update model
    for k, m in model.named_modules():
        m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility
        if isinstance(m, models.common.Conv):  # assign export-friendly activations
            if isinstance(m.act, nn.Hardswish):
                m.act = Hardswish()
            elif isinstance(m.act, nn.SiLU):
                m.act = SiLU()
        # elif isinstance(m, models.yolo.Detect):
        #     m.forward = m.forward_export  # assign forward (optional)
        if isinstance(m, models.common.ShuffleV2Block):#shufflenet block nn.SiLU
            for i in range(len(m.branch1)):
                if isinstance(m.branch1[i], nn.SiLU):
                    m.branch1[i] = SiLU()
            for i in range(len(m.branch2)):
                if isinstance(m.branch2[i], nn.SiLU):
                    m.branch2[i] = SiLU()
    model.model[-1].export = True  # set Detect() layer export=True
    model.model[-1].export_cat = False
    y = model(img)  # dry run

    # ONNX export
    print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
    f = opt.weights.replace('.pt', '.onnx')  # filename
    model.fuse()  # only for ONNX
    input_names=['data']
    output_names=['stride_' + str(int(x)) for x in model.stride]
    # output_names = ['outputs_{}'.format(int(model.stride.max()))]

    dynamic_axes = {out: {0: '?', 2: '?', 3: '?'} for out in output_names}
    dynamic_axes[input_names[0]] = {0: '?', 2: '?', 3: '?'}
    torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=input_names,
                      output_names=output_names, dynamic_axes=dynamic_axes)

    #ONNX Simplifier
    # Checks
    onnx_model = onnx.load(f)  # load onnx model
    onnx.checker.check_model(onnx_model)  # check onnx model
    # print(onnx.helper.printable_graph(onnx_model.graph))  # print a human readable model
    print('ONNX export success, saved as %s' % f)
    # Finish
    print('\nExport complete (%.2fs). Visualize with https://github.com/lutzroeder/netron.' % (time.time() - t))
    
    print('Simplifying ONNX model')
    model_simp, check = simplify(onnx_model, dynamic_input_shape=True, input_shapes={"data":(1,3,800,800)})
    assert check, "Simplified ONNX model could not be validated"
    onnx.save(model_simp, f)
    print("\nExport simplified model complete.")

    # PB export
    if opt.onnx2pb:
        print('download the newest onnx_tf by https://github.com/onnx/onnx-tensorflow/tree/master/onnx_tf')
        from onnx_tf.backend import prepare
        import tensorflow as tf

        outpb = f.replace('.onnx', '.pb')  # filename
        # strict=True maybe leads to KeyError: 'pyfunc_0', check: https://github.com/onnx/onnx-tensorflow/issues/167
        tf_rep = prepare(onnx_model, strict=False)  # prepare tf representation
        tf_rep.export_graph(outpb)  # export the model

        out_onnx = tf_rep.run(img) # onnx output

        # check pb
        with tf.Graph().as_default():
            graph_def = tf.GraphDef()
            with open(outpb, "rb") as f:
                graph_def.ParseFromString(f.read())
                tf.import_graph_def(graph_def, name="")
            with tf.Session() as sess:
                init = tf.global_variables_initializer()
                input_x = sess.graph.get_tensor_by_name(input_names[0]+':0')  # input
                outputs = []
                for i in output_names:
                    outputs.append(sess.graph.get_tensor_by_name(i+':0'))
                out_pb = sess.run(outputs, feed_dict={input_x: img})

        print(f'out_pytorch {y}')
        print(f'out_onnx {out_onnx}')
        print(f'out_pb {out_pb}')


luisfmnunes avatar Mar 10 '22 17:03 luisfmnunes

Hi @luisfmnunes

I converted to onnx . I used https://github.com/DefTruth/lite.ai.toolkit/blob/main/docs/hub/lite.ai.toolkit.hub.onnx.md yolov5face onnx . But it has always return face score 0.98....

How I can use the onnx model converted by your python code above ?

input and output looks different

data and stride_32 ? i am tryign to inference in c++

The srucial part is the landmark detection . Its feeding the recognition model so better landmark better result.

By the way do you know any quick way to decide the face quality ? I used https://github.com/deepcam-cn/FaceQuality and looks very promisin but cant convert it to onnx specially backbone model output.

Due to I am mainly using c++ so need onnx

Best

MyraBaba avatar Nov 27 '23 16:11 MyraBaba

Hi @luisfmnunes

I converted to onnx . I used https://github.com/DefTruth/lite.ai.toolkit/blob/main/docs/hub/lite.ai.toolkit.hub.onnx.md yolov5face onnx . But it has always return face score 0.98....

How I can use the onnx model converted by your python code above ?

input and output looks different

data and stride_32 ? i am tryign to inference in c++

The srucial part is the landmark detection . Its feeding the recognition model so better landmark better result.

By the way do you know any quick way to decide the face quality ? I used https://github.com/deepcam-cn/FaceQuality and looks very promisin but cant convert it to onnx specially backbone model output.

Due to I am mainly using c++ so need onnx

Best

It's long since I used yolo v5 for this purpose, but the script above simply receive commands line args to get a pretrained PyTorch model (hence the weights option), build an nn.Module of YoloV5 and load the state dict on it, then it exports an ONNX model from it. I don't remember if the model has in-built preprocessing and post processing, but I believe it is unlikely, so there are three options, implement the equivalent preprocessing and post processing in C++, implement nn.Modules using PyTorch functions to implement pre and postprocessing and export them as ONNX Model, or use any other ONNX Compiler to build your own graph with the operations of pre and post processing.

luisfmnunes avatar Nov 29 '23 13:11 luisfmnunes