mmpose [Bug] RTMPose cannot work properly in ncnn model format

[Bug] RTMPose cannot work properly in ncnn model format

Open RFYoung opened this issue 2 years ago • 0 comments

Prerequisite

[x] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).

Environment

Actually MMPose is not necessary here.

Python 3.11.5 (main, Sep  2 2023, 14:16:33) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnx, ncnn, numpy, cv2, loguru, onnxoptimizer, onnxsim
>>> print(onnx.__version__, ncnn.__version__, numpy.__version__, cv2.__version__, loguru.__version__, onnxoptimizer.__version__, onnxsim.__version__)
1.14.1 1.0.20231027 1.26.1 4.8.1 0.7.2 0.3.13 0.4.35

Also, I am using this version of ncnn to convert onnx to ncnn (for model conversion only): https://github.com/Tencent/ncnn/commit/31e315981a6a81a5065dbc4f8996b3963bbf1dd8 To build the ncnn tools:

cd ncnn
mkdir build && cd build
cmake -DNCNN_PYTHON=ON -DCMAKE_BUILD_TYPE=Debug -DNCNN_VULKAN=OFF -DNCNN_BUILD_EXAMPLES=ON  ..
make -j8

Reproduces the problem - code sample

I have extracted somr functions from this onnx example:

# onnx_utils.py
from typing import List, Tuple

import cv2
import numpy as np


def preprocess(
    img: np.ndarray, input_size: Tuple[int, int] = (192, 256)
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Do preprocessing for RTMPose model inference.

    Args:
        img (np.ndarray): Input image in shape.
        input_size (tuple): Input image size in shape (w, h).

    Returns:
        tuple:
        - resized_img (np.ndarray): Preprocessed image.
        - center (np.ndarray): Center of image.
        - scale (np.ndarray): Scale of image.
    """
    # get shape of image
    img_shape = img.shape[:2]
    bbox = np.array([0, 0, img_shape[1], img_shape[0]])

    # get center and scale
    center, scale = bbox_xyxy2cs(bbox, padding=1.25)

    # do affine transformation
    resized_img, scale = top_down_affine(input_size, scale, center, img)

    # normalize image
    mean = np.array([123.675, 116.28, 103.53])
    std = np.array([58.395, 57.12, 57.375])
    resized_img = (resized_img - mean) / std

    return resized_img, center, scale


def postprocess(outputs: List[np.ndarray],
                model_input_size: Tuple[int, int],
                center: Tuple[int, int],
                scale: Tuple[int, int],
                simcc_split_ratio: float = 2.0
                ) -> Tuple[np.ndarray, np.ndarray]:
    """Postprocess for RTMPose model output.

    Args:
        outputs (np.ndarray): Output of RTMPose model.
        model_input_size (tuple): RTMPose model Input image size.
        center (tuple): Center of bbox in shape (x, y).
        scale (tuple): Scale of bbox in shape (w, h).
        simcc_split_ratio (float): Split ratio of simcc.

    Returns:
        tuple:
        - keypoints (np.ndarray): Rescaled keypoints.
        - scores (np.ndarray): Model predict scores.
    """
    # use simcc to decode
    simcc_x, simcc_y = outputs
    keypoints, scores = decode(simcc_x, simcc_y, simcc_split_ratio)

    # rescale keypoints
    keypoints = keypoints / model_input_size * scale + center - scale / 2

    return keypoints, scores


def visualize(img: np.ndarray,
              keypoints: np.ndarray,
              scores: np.ndarray,
              filename: str = 'output.jpg',
              thr=0.3) -> np.ndarray:
    """Visualize the keypoints and skeleton on image.

    Args:
        img (np.ndarray): Input image in shape.
        keypoints (np.ndarray): Keypoints in image.
        scores (np.ndarray): Model predict scores.
        thr (float): Threshold for visualize.

    Returns:
        img (np.ndarray): Visualized image.
    """
    # default color
    skeleton = [(15, 13), (13, 11), (16, 14), (14, 12), (11, 12), (5, 11),
                (6, 12), (5, 6), (5, 7), (6, 8), (7, 9), (8, 10), (1, 2),
                (0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (15, 17),
                (15, 18), (15, 19), (16, 20), (16, 21), (16, 22), (91, 92),
                (92, 93), (93, 94), (94, 95), (91, 96), (96, 97), (97, 98),
                (98, 99), (91, 100), (100, 101), (101, 102), (102, 103),
                (91, 104), (104, 105), (105, 106), (106, 107), (91, 108),
                (108, 109), (109, 110), (110, 111), (112, 113), (113, 114),
                (114, 115), (115, 116), (112, 117), (117, 118), (118, 119),
                (119, 120), (112, 121), (121, 122), (122, 123), (123, 124),
                (112, 125), (125, 126), (126, 127), (127, 128), (112, 129),
                (129, 130), (130, 131), (131, 132)]
    palette = [[51, 153, 255], [0, 255, 0], [255, 128, 0], [255, 255, 255],
               [255, 153, 255], [102, 178, 255], [255, 51, 51]]
    link_color = [
        1, 1, 2, 2, 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2,
        2, 2, 2, 2, 2, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1, 2, 2, 2,
        2, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1
    ]
    point_color = [
        0, 0, 0, 0, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2,
        4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1, 3, 2, 2, 2, 2, 4, 4, 4,
        4, 5, 5, 5, 5, 6, 6, 6, 6, 1, 1, 1, 1
    ]

    # draw keypoints and skeleton
    for kpts, score in zip(keypoints, scores):
        keypoints_num = len(score)
        for kpt, color in zip(kpts, point_color):
            cv2.circle(img, tuple(kpt.astype(np.int32)), 1, palette[color], 1,
                       cv2.LINE_AA)
        for (u, v), color in zip(skeleton, link_color):
            if u < keypoints_num and v < keypoints_num \
                        and score[u] > thr and score[v] > thr:
                cv2.line(img, tuple(kpts[u].astype(np.int32)),
                         tuple(kpts[v].astype(np.int32)), palette[color], 2,
                         cv2.LINE_AA)

    # save to local
    cv2.imwrite(filename, img)

    return img


def bbox_xyxy2cs(bbox: np.ndarray,
                 padding: float = 1.) -> Tuple[np.ndarray, np.ndarray]:
    """Transform the bbox format from (x,y,w,h) into (center, scale)

    Args:
        bbox (ndarray): Bounding box(es) in shape (4,) or (n, 4), formatted
            as (left, top, right, bottom)
        padding (float): BBox padding factor that will be multilied to scale.
            Default: 1.0

    Returns:
        tuple: A tuple containing center and scale.
        - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
            (n, 2)
        - np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
            (n, 2)
    """
    # convert single bbox from (4, ) to (1, 4)
    dim = bbox.ndim
    if dim == 1:
        bbox = bbox[None, :]

    # get bbox center and scale
    x1, y1, x2, y2 = np.hsplit(bbox, [1, 2, 3])
    center = np.hstack([x1 + x2, y1 + y2]) * 0.5
    scale = np.hstack([x2 - x1, y2 - y1]) * padding

    if dim == 1:
        center = center[0]
        scale = scale[0]

    return center, scale


def _fix_aspect_ratio(bbox_scale: np.ndarray,
                      aspect_ratio: float) -> np.ndarray:
    """Extend the scale to match the given aspect ratio.

    Args:
        scale (np.ndarray): The image scale (w, h) in shape (2, )
        aspect_ratio (float): The ratio of ``w/h``

    Returns:
        np.ndarray: The reshaped image scale in (2, )
    """
    w, h = np.hsplit(bbox_scale, [1])
    bbox_scale = np.where(w > h * aspect_ratio,
                          np.hstack([w, w / aspect_ratio]),
                          np.hstack([h * aspect_ratio, h]))
    return bbox_scale


def _rotate_point(pt: np.ndarray, angle_rad: float) -> np.ndarray:
    """Rotate a point by an angle.

    Args:
        pt (np.ndarray): 2D point coordinates (x, y) in shape (2, )
        angle_rad (float): rotation angle in radian

    Returns:
        np.ndarray: Rotated point in shape (2, )
    """
    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
    rot_mat = np.array([[cs, -sn], [sn, cs]])
    return rot_mat @ pt


def _get_3rd_point(a: np.ndarray, b: np.ndarray) -> np.ndarray:
    """To calculate the affine matrix, three pairs of points are required. This
    function is used to get the 3rd point, given 2D points a & b.

    The 3rd point is defined by rotating vector `a - b` by 90 degrees
    anticlockwise, using b as the rotation center.

    Args:
        a (np.ndarray): The 1st point (x,y) in shape (2, )
        b (np.ndarray): The 2nd point (x,y) in shape (2, )

    Returns:
        np.ndarray: The 3rd point.
    """
    direction = a - b
    c = b + np.r_[-direction[1], direction[0]]
    return c


def get_warp_matrix(center: np.ndarray,
                    scale: np.ndarray,
                    rot: float,
                    output_size: Tuple[int, int],
                    shift: Tuple[float, float] = (0., 0.),
                    inv: bool = False) -> np.ndarray:
    """Calculate the affine transformation matrix that can warp the bbox area
    in the input image to the output size.

    Args:
        center (np.ndarray[2, ]): Center of the bounding box (x, y).
        scale (np.ndarray[2, ]): Scale of the bounding box
            wrt [width, height].
        rot (float): Rotation angle (degree).
        output_size (np.ndarray[2, ] | list(2,)): Size of the
            destination heatmaps.
        shift (0-100%): Shift translation ratio wrt the width/height.
            Default (0., 0.).
        inv (bool): Option to inverse the affine transform direction.
            (inv=False: src->dst or inv=True: dst->src)

    Returns:
        np.ndarray: A 2x3 transformation matrix
    """
    shift = np.array(shift)
    src_w = scale[0]
    dst_w = output_size[0]
    dst_h = output_size[1]

    # compute transformation matrix
    rot_rad = np.deg2rad(rot)
    src_dir = _rotate_point(np.array([0., src_w * -0.5]), rot_rad)
    dst_dir = np.array([0., dst_w * -0.5])

    # get four corners of the src rectangle in the original image
    src = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = center + scale * shift
    src[1, :] = center + src_dir + scale * shift
    src[2, :] = _get_3rd_point(src[0, :], src[1, :])

    # get four corners of the dst rectangle in the input image
    dst = np.zeros((3, 2), dtype=np.float32)
    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])

    if inv:
        warp_mat = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        warp_mat = cv2.getAffineTransform(np.float32(src), np.float32(dst))

    return warp_mat


def top_down_affine(input_size: dict, bbox_scale: dict, bbox_center: dict,
                    img: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    """Get the bbox image as the model input by affine transform.

    Args:
        input_size (dict): The input size of the model.
        bbox_scale (dict): The bbox scale of the img.
        bbox_center (dict): The bbox center of the img.
        img (np.ndarray): The original image.

    Returns:
        tuple: A tuple containing center and scale.
        - np.ndarray[float32]: img after affine transform.
        - np.ndarray[float32]: bbox scale after affine transform.
    """
    w, h = input_size
    warp_size = (int(w), int(h))

    # reshape bbox to fixed aspect ratio
    bbox_scale = _fix_aspect_ratio(bbox_scale, aspect_ratio=w / h)

    # get the affine matrix
    center = bbox_center
    scale = bbox_scale
    rot = 0
    warp_mat = get_warp_matrix(center, scale, rot, output_size=(w, h))

    # do affine transform
    img = cv2.warpAffine(img, warp_mat, warp_size, flags=cv2.INTER_LINEAR)

    return img, bbox_scale


def get_simcc_maximum(simcc_x: np.ndarray,
                      simcc_y: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    """Get maximum response location and value from simcc representations.

    Note:
        instance number: N
        num_keypoints: K
        heatmap height: H
        heatmap width: W

    Args:
        simcc_x (np.ndarray): x-axis SimCC in shape (K, Wx) or (N, K, Wx)
        simcc_y (np.ndarray): y-axis SimCC in shape (K, Wy) or (N, K, Wy)

    Returns:
        tuple:
        - locs (np.ndarray): locations of maximum heatmap responses in shape
            (K, 2) or (N, K, 2)
        - vals (np.ndarray): values of maximum heatmap responses in shape
            (K,) or (N, K)
    """
    N, K, Wx = simcc_x.shape
    simcc_x = simcc_x.reshape(N * K, -1)
    simcc_y = simcc_y.reshape(N * K, -1)

    # get maximum value locations
    x_locs = np.argmax(simcc_x, axis=1)
    y_locs = np.argmax(simcc_y, axis=1)
    locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32)
    max_val_x = np.amax(simcc_x, axis=1)
    max_val_y = np.amax(simcc_y, axis=1)

    # get maximum value across x and y axis
    mask = max_val_x > max_val_y
    max_val_x[mask] = max_val_y[mask]
    vals = max_val_x
    locs[vals <= 0.] = -1

    # reshape
    locs = locs.reshape(N, K, 2)
    vals = vals.reshape(N, K)

    return locs, vals


def decode(simcc_x: np.ndarray, simcc_y: np.ndarray,
           simcc_split_ratio) -> Tuple[np.ndarray, np.ndarray]:
    """Modulate simcc distribution with Gaussian.

    Args:
        simcc_x (np.ndarray[K, Wx]): model predicted simcc in x.
        simcc_y (np.ndarray[K, Wy]): model predicted simcc in y.
        simcc_split_ratio (int): The split ratio of simcc.

    Returns:
        tuple: A tuple containing center and scale.
        - np.ndarray[float32]: keypoints in shape (K, 2) or (n, K, 2)
        - np.ndarray[float32]: scores in shape (K,) or (n, K)
    """
    keypoints, scores = get_simcc_maximum(simcc_x, simcc_y)
    keypoints /= simcc_split_ratio

    return keypoints, scores

Here is the main function:

# ncnn_main.py
import argparse
import time
from typing import List

import cv2
import loguru
import numpy as np
import ncnn
from onnx_utils import preprocess, postprocess, visualize

logger = loguru.logger

def parse_args():
    parser = argparse.ArgumentParser(
        description='RTMPose ONNX inference demo.')
    parser.add_argument('ncnn_file', help='ONNX file path')
    parser.add_argument('image_file', help='Input image file path')
    parser.add_argument(
        '--device', help='device type for inference', default='cpu')
    parser.add_argument(
        '--save-path',
        help='path to save the output image',
        default='output.jpg')
    args = parser.parse_args()
    return args


def inference(ex: ncnn.Extractor , img: np.ndarray) -> List:
    img = np.expand_dims(img.transpose(2, 0, 1), 0).astype(np.float32)
    mat_in = ncnn.Mat(img)
    ex.input("input", mat_in)

    x_ret, simcc_x = ex.extract("simcc_x")
    assert x_ret == 0
    simcc_x = np.expand_dims(np.array(simcc_x), 0)

    y_ret, simcc_y = ex.extract("simcc_y")
    assert y_ret == 0
    simcc_y = np.expand_dims(np.array(simcc_y), 0)

    return [simcc_x, simcc_y]


def main():
    args = parse_args()
    logger.info('Start running model on RTMPose...')

    # read image from file
    logger.info('1. Read image from {}...'.format(args.image_file))
    img = cv2.imread(args.image_file)

    # build onnx model
    logger.info('2. Build ncnn model from {}...'.format(args.ncnn_file))
    
    net = ncnn.Net()

    net.opt.use_fp16_packed = False
    net.opt.use_fp16_storage = False
    net.opt.use_fp16_arithmetic = False
    net.clear()
    net.load_param(args.ncnn_file + ".param")
    net.load_model(args.ncnn_file + ".bin")

    h, w = (256, 192)
    model_input_size = (w, h)

    # preprocessing
    logger.info('3. Preprocess image...')
    resized_img, center, scale = preprocess(img, model_input_size)

    # inference
    logger.info('4. Inference...')
    start_time = time.time()
    with net.create_extractor() as ex:
        outputs = inference(ex, resized_img)
    end_time = time.time()
    logger.info('4. Inference done, time cost: {:.4f}s'.format(end_time -
                                                               start_time))

    # postprocessing
    logger.info('5. Postprocess...')
    keypoints, scores = postprocess(outputs, model_input_size, center, scale)

    # visualize inference result
    logger.info('6. Visualize inference result...')
    visualize(img, keypoints, scores, args.save_path)

    logger.info('Done...')


if __name__ == '__main__':
    main()

Here is also an onnx optimize script using onnxoptimizer and onnx-simplifier:

# onnx_optimize_script.py
import onnx
from onnxoptimizer import optimize
from onnxsim import simplify

def convert_model(input_model_path, output_model_path):
    # Load the ONNX model
    model = onnx.load(input_model_path)
    
    # Make the model static by setting the batch size to 1
    model.graph.input[0].type.tensor_type.shape.dim[0].dim_value = 1
    
    # Optimize the model
    optimized_model = optimize(model)
    
    # Simplify the model
    simplified_model, check = simplify(model)
    
    # Check if the simplified model is valid
    if not check:
        print("The simplified model is invalid!")
        return
    
    # Save the modified model
    onnx.save(simplified_model, output_model_path)
    print(f"Model saved to: {output_model_path}")

# Usage
input_model_path = "rtmpose-t.onnx"
output_model_path = "rtmpose-t-sim.onnx"
convert_model(input_model_path, output_model_path)

Reproduces the problem - command or script

Use existing ncnn model from deploee

The ncnn model (rtmpose-t_8xb256-420e_aic-coco-256x192) is from openmmlab deploee:

wget https://mmdeploy-oss.openmmlab.com/model/mmpose/rtmpose-t-ncnn-155ab7.zip
mkdir ncnn_model 
unzip rtmpose-t-ncnn-155ab7.zip -d ncnn_model

Then use ncnn model to inference:

wget https://raw.githubusercontent.com/open-mmlab/mmpose/dev-1.x/projects/rtmpose/examples/onnxruntime/human-pose.jpeg
python3 ncnn_main.py ncnn_model/end2end human-pose.jpeg

Convert ncnn model from onnx

To convert the model from onnx to ncnn, please run:

wget https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/onnx_sdk/rtmpose-t_simcc-body7_pt-body7_420e-256x192-026a1439_20230504.zip -o rtmpose-t-onnx.zip
unzip rtmpose-t-onnx.zip
cp ./20230831/rtmpose_onnx/rtmpose-t_simcc-body7_pt-body7_420e-256x192-026a1439_20230504/end2end.onnx ./rtmpose-t.onnx
./ncnn/build/tools/onnx/onnx2ncnn rtmpose-t.onnx rtmpose-t.param rtmpose-t.bin

And inference with this model:

python3 ncnn_main.py rtmpose-t human-pose.jpeg

Optionally, onnx model could be optimized before converted to ncnn;

python3 onnx_optimize_script.py
./ncnn/build/tools/onnx/onnx2ncnn rtmpose-t-sim.onnx rtmpose-t-sim.param rtmpose-t-sim.bin

Reproduces the problem - error message

This is the result with ncnn from deploee: output The expected result should be something like this: output

Also, I cannot get result from manually converted model since the python program will crash with segment fault error (no traceback), no matter with or without the optimization above.

Additional information

No response

Nov 04 '23 13:11 RFYoung

mmpose mmpose copied to clipboard

[Bug] RTMPose cannot work properly in ncnn model format

Prerequisite

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Use existing ncnn model from deploee

Convert ncnn model from onnx

Reproduces the problem - error message

Additional information

mmpose
mmpose copied to clipboard