openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Bug]: Benchmark cannot measure real performance because benchmark input is not randomized correctly

Open RiverLight4 opened this issue 5 months ago • 1 comments

OpenVINO Version

2024.3.0

Operating System

Windows System

Device used for inference

GPU

Framework

None

Model used

Mask R-CNN

Issue description

Background

I'd like to do inference with Mask R-CNN for instance segmentation on Intel iGPU. I fixed sync_benchmark.py to do inference in iGPU, check the profile, and then I tested with my actual code.

While benchmark profiler says that model inference latency is about 1,000ms, it takes 30,000ms actually when using real image which I'd like to do inference. When I try inference twice in same script, it takes about 30,000ms at first inference, but it takes about 1,000ms at second inference. I'm afraid that some result or calculation data is cached by OpenVINO (or OpenCL in the OpenVINO) and is not calculated actually.

This issue is not shown when using CPU execution.

Main Issue

I checked sync_benchmark.py' and 'throughput_benchmark.py and I found that model input is always filled with same values. fill_tensor_random(tensor) makes same output because it always make randomizer with same seeds (0).

def fill_tensor_random(tensor):
    dtype = get_dtype(tensor.element_type)
    rand_min, rand_max = (0, 1) if dtype == bool else (np.iinfo(np.uint8).min, np.iinfo(np.uint8).max)
    # np.random.uniform excludes high: add 1 to have it generated
    if np.dtype(dtype).kind in ['i', 'u', 'b']:
        rand_max += 1
    rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0))) ### HERE: Initialized with static value
    if 0 == tensor.get_size():
        raise RuntimeError("Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference")
    tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype)

It seems that it is not only python, but also C++ benchmark apps. Here is the code which may have same problem (not tested):

samples/python/benchmark/sync_benchmark/sync_benchmark.py
samples/python/benchmark/throughput_benchmark/throughput_benchmark.py
- def fill_tensor_random(tensor)
  - rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0)))
samples/cpp/common/utils/include/samples/common.hpp
(called from samples/cpp/benchmark/sync_benchmark/main.cpp, samples/cpp/benchmark/throughput_benchmark/main.cpp)
- static inline void fill_random(ov::Tensor& tensor, T rand_min = std::numeric_limits<uint8_t>::min(), T rand_max = std::numeric_limits<uint8_t>::max())
  - std::mt19937 gen(0);
samples/cpp/benchmark_app/inputs_filling.cpp
- ov::Tensor create_tensor_random(const benchmark_app::InputInfo& inputInfo, T rand_min = std::numeric_limits<uint8_t>::min(), T rand_max = std::numeric_limits<uint8_t>::max())
  - std::mt19937 gen(0)

Is this expected behavior of OpenVINO benchmark profiler?

Step-by-step reproduction

  1. Make Mask R-CNN ONNX model from Torchvision
  • see also https://pytorch.org/vision/stable/models/mask_rcnn.html
  • Download maskrcnn_resnet50_fpn_v2_coco-73cbd019.pth from here (to avoid corporate proxy problem)
    • from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/mask_rcnn.py
  • Run this code (almost same as maskrcnn_resnet50_fpn_v2())
import torch
import torchvision
from torchinfo import summary

# library at torchvision 0.14.1 : torchvision.models.detection.mask_rcnn

from collections import OrderedDict
from typing import Any, Callable, Optional

from torch import nn
from torchvision.ops import MultiScaleRoIAlign
from torchvision.ops import misc as misc_nn_ops
from torchvision.transforms._presets import ObjectDetection
from torchvision.models._api import register_model, Weights, WeightsEnum
from torchvision.models._meta import _COCO_CATEGORIES
from torchvision.models._utils import _ovewrite_value_param, handle_legacy_interface
from torchvision.models.resnet import resnet50, ResNet50_Weights
from torchvision.models.detection._utils import overwrite_eps
from torchvision.models.detection.backbone_utils import _resnet_fpn_extractor, _validate_trainable_layers
from torchvision.models.detection.faster_rcnn import _default_anchorgen, FasterRCNN, FastRCNNConvFCHead, RPNHead
from torchvision.models.detection.mask_rcnn import MaskRCNN, MaskRCNN_ResNet50_FPN_V2_Weights, MaskRCNNHeads

weights = None
weights_backbone = None
num_classes = 91
trainable_backbone_layers = _validate_trainable_layers(False, None, 5, 3)

backbone = resnet50(weights=weights_backbone, progress=True)
backbone = _resnet_fpn_extractor(backbone, trainable_backbone_layers, norm_layer=nn.BatchNorm2d)
rpn_anchor_generator = _default_anchorgen()
rpn_head = RPNHead(backbone.out_channels, rpn_anchor_generator.num_anchors_per_location()[0], conv_depth=2)
box_head = FastRCNNConvFCHead(
    (backbone.out_channels, 7, 7), [256, 256, 256, 256], [1024], norm_layer=nn.BatchNorm2d
)
mask_head = MaskRCNNHeads(backbone.out_channels, [256, 256, 256, 256], 1, norm_layer=nn.BatchNorm2d)

kwargs = {} # mapping

model = MaskRCNN(backbone, num_classes=num_classes, rpn_anchor_generator=rpn_anchor_generator, rpn_head=rpn_head, box_head=box_head, mask_head=mask_head, **kwargs,)
model.load_state_dict(torch.load('maskrcnn_resnet50_fpn_v2_coco-73cbd019.pth'))
model.eval()

x = torch.rand(1, 3, 800, 1344)

predictions = model(x)
torch.onnx.export(model, x, "end2end.onnx", opset_version = 11, input_names=['input'], output_names = ['boxes', 'labels', 'scores', 'masks'], dynamic_axes = {'input': {2: 'height', 3: 'width'}, 'boxes': { 1:'num'}, 'labels':{ 1: 'num'}, 'scores': {1: 'num'},'masks' : {1:'num'}})

  1. convert ONNX to OpenVINO IR
  • I did in OpenVINO 2023.3 env so use old Model Optimizer, not NNCF
mo --input_model=".\end2end.onnx" --output_dir="./" --output="boxes,labels,scores,masks" --input="input" --input_shape="[1, 3, 800, 1344]"
  1. Copy sync_benchmark.py to sync_benchmark_GPU.py and fix like this:
def fill_tensor_random(tensor):
    seed = 42 ## ADD
#    seed = int(datetime.datetime.timestamp(datetime.datetime.now())) ## ADD
    print(f"Seed: {seed}") ## ADD
    
    rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(seed))) ## CHANGE

    tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype)
    print(tensor.data[0,0,0], dtype) ## ADD


def main():
#    device_name = 'CPU'
    device_name = 'GPU' ## CHANGE

    while time_point < time_point_to_finish or len(latencies) < niter:
        for model_input in compiled_model.inputs: ## ADD
            fill_tensor_random(ireq.get_tensor(model_input)) ## ADD
        out = ireq.infer()
        iter_end = perf_counter()
        latencies.append((iter_end - time_point) * 1e3)
        print(f'{len(latencies): 2d} : {(iter_end - time_point)*1000: .2f} ms, {out["scores"].data[0]}') ## ADD
        time_point = iter_end

  1. Run python sync_benchmark_GPU.py ./end2end.xml
  2. Copy sync_benchmark_GPU.py into sync_benchmark_GPU_fix.py and fix like this:
def fill_tensor_random(tensor):
#    seed = 42 ## CHANGE
    seed = int(datetime.datetime.timestamp(datetime.datetime.now())) ## CHANGE
    print(f"Seed: {seed}") ## ADD

  1. Run python sync_benchmark_GPU_fix.py ./end2end.xml
  2. Compare log

Relevant log output

Original code (static random seed):

(.venv) PS C:\work\OpenVINO> python .\sync_benchmark_GPU.py .\end2end.xml
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 1 :  1414.72 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 2 :  1010.47 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 3 :  1024.23 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 4 :  1001.27 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 5 :  1000.56 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 6 :  1010.31 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 7 :  1010.87 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 8 :  1020.17 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 9 :  997.74 ms, 1.0
Seed: 42
[138.20845  158.01514   14.629294 ... 211.95033  122.911835 242.03699 ] float32
 10 :  1001.57 ms, 1.0
[ INFO ] Count:          10 iterations
[ INFO ] Duration:       10491.92 ms
[ INFO ] Latency:
[ INFO ]     Median:     1010.39 ms
[ INFO ]     Average:    1049.19 ms
[ INFO ]     Min:        997.74 ms
[ INFO ]     Max:        1414.72 ms
[ INFO ] Throughput: 0.95 FPS

Fixed code (Randomized random seed)

(.venv) PS C:\work\OpenVINO> python .\sync_benchmark_GPU_fix.py .\end2end.xml
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
Seed: 1724891439
[216.66624 215.04857 186.01024 ... 243.51692  65.004   226.98805] float32
Seed: 1724891469
[116.17603  196.96167  117.602425 ... 183.78407   46.60835   29.839851] float32
 1 :  29024.31 ms, 1.0
Seed: 1724891498
[ 45.37799 240.99669 239.74086 ... 170.49042 169.36159  43.48929] float32
 2 :  26602.12 ms, 1.0
Seed: 1724891525
[ 81.0459  130.74333  88.43377 ... 103.64926 107.10424  83.10236] float32
 3 :  26385.79 ms, 1.0
Seed: 1724891551
[112.18315  207.00949  193.35474  ... 139.64346  131.95663   24.062374] float32
 4 :  24066.72 ms, 1.0
Seed: 1724891575
[177.98949 150.92938 197.91206 ... 140.75986 215.25012 246.18552] float32
 5 :  26006.79 ms, 1.0
Seed: 1724891601
[ 64.49129     1.9813926  38.871136  ... 197.17485    56.3287
  42.59408  ] float32
 6 :  26609.57 ms, 1.0
Seed: 1724891628
[182.92015  135.3787    92.847496 ... 232.47296  204.9917   246.40154 ] float32
 7 :  23220.02 ms, 1.0
Seed: 1724891651
[ 73.514114  36.317707  50.1486   ...  73.474625 178.2396    69.60647 ] float32
 8 :  23810.37 ms, 1.0
Seed: 1724891675
[121.73839  159.2803   247.14323  ...  31.701612  60.62744  148.41168 ] float32
 9 :  23489.89 ms, 1.0
Seed: 1724891698
[126.19236   74.2312   233.52321  ... 214.48938  221.90755   39.946312] float32
 10 :  21840.82 ms, 1.0
[ INFO ] Count:          10 iterations
[ INFO ] Duration:       251056.41 ms
[ INFO ] Latency:
[ INFO ]     Median:     25036.76 ms
[ INFO ]     Average:    25105.64 ms
[ INFO ]     Min:        21840.82 ms
[ INFO ]     Max:        29024.31 ms
[ INFO ] Throughput: 0.04 FPS

Issue submission checklist

  • [X] I'm reporting an issue. It's not a question.
  • [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • [x] There is reproducer code and related data files such as images, videos, models, etc.

RiverLight4 avatar Aug 29 '24 03:08 RiverLight4