openvino
openvino copied to clipboard
[Bug]: Benchmark cannot measure real performance because benchmark input is not randomized correctly
OpenVINO Version
2024.3.0
Operating System
Windows System
Device used for inference
GPU
Framework
None
Model used
Mask R-CNN
Issue description
Background
I'd like to do inference with Mask R-CNN for instance segmentation on Intel iGPU.
I fixed sync_benchmark.py
to do inference in iGPU, check the profile, and then I tested with my actual code.
While benchmark profiler says that model inference latency is about 1,000ms, it takes 30,000ms actually when using real image which I'd like to do inference. When I try inference twice in same script, it takes about 30,000ms at first inference, but it takes about 1,000ms at second inference. I'm afraid that some result or calculation data is cached by OpenVINO (or OpenCL in the OpenVINO) and is not calculated actually.
This issue is not shown when using CPU execution.
Main Issue
I checked sync_benchmark.py' and 'throughput_benchmark.py
and I found that model input is always filled with same values.
fill_tensor_random(tensor)
makes same output because it always make randomizer with same seeds (0).
def fill_tensor_random(tensor):
dtype = get_dtype(tensor.element_type)
rand_min, rand_max = (0, 1) if dtype == bool else (np.iinfo(np.uint8).min, np.iinfo(np.uint8).max)
# np.random.uniform excludes high: add 1 to have it generated
if np.dtype(dtype).kind in ['i', 'u', 'b']:
rand_max += 1
rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0))) ### HERE: Initialized with static value
if 0 == tensor.get_size():
raise RuntimeError("Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference")
tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype)
It seems that it is not only python, but also C++ benchmark apps. Here is the code which may have same problem (not tested):
samples/python/benchmark/sync_benchmark/sync_benchmark.py
samples/python/benchmark/throughput_benchmark/throughput_benchmark.py
- def fill_tensor_random(tensor)
- rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0)))
samples/cpp/common/utils/include/samples/common.hpp
(called from samples/cpp/benchmark/sync_benchmark/main.cpp, samples/cpp/benchmark/throughput_benchmark/main.cpp)
- static inline void fill_random(ov::Tensor& tensor, T rand_min = std::numeric_limits<uint8_t>::min(), T rand_max = std::numeric_limits<uint8_t>::max())
- std::mt19937 gen(0);
samples/cpp/benchmark_app/inputs_filling.cpp
- ov::Tensor create_tensor_random(const benchmark_app::InputInfo& inputInfo, T rand_min = std::numeric_limits<uint8_t>::min(), T rand_max = std::numeric_limits<uint8_t>::max())
- std::mt19937 gen(0)
Is this expected behavior of OpenVINO benchmark profiler?
Step-by-step reproduction
- Make Mask R-CNN ONNX model from Torchvision
- see also https://pytorch.org/vision/stable/models/mask_rcnn.html
- Download
maskrcnn_resnet50_fpn_v2_coco-73cbd019.pth
from here (to avoid corporate proxy problem)- from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/mask_rcnn.py
- Run this code (almost same as maskrcnn_resnet50_fpn_v2())
import torch
import torchvision
from torchinfo import summary
# library at torchvision 0.14.1 : torchvision.models.detection.mask_rcnn
from collections import OrderedDict
from typing import Any, Callable, Optional
from torch import nn
from torchvision.ops import MultiScaleRoIAlign
from torchvision.ops import misc as misc_nn_ops
from torchvision.transforms._presets import ObjectDetection
from torchvision.models._api import register_model, Weights, WeightsEnum
from torchvision.models._meta import _COCO_CATEGORIES
from torchvision.models._utils import _ovewrite_value_param, handle_legacy_interface
from torchvision.models.resnet import resnet50, ResNet50_Weights
from torchvision.models.detection._utils import overwrite_eps
from torchvision.models.detection.backbone_utils import _resnet_fpn_extractor, _validate_trainable_layers
from torchvision.models.detection.faster_rcnn import _default_anchorgen, FasterRCNN, FastRCNNConvFCHead, RPNHead
from torchvision.models.detection.mask_rcnn import MaskRCNN, MaskRCNN_ResNet50_FPN_V2_Weights, MaskRCNNHeads
weights = None
weights_backbone = None
num_classes = 91
trainable_backbone_layers = _validate_trainable_layers(False, None, 5, 3)
backbone = resnet50(weights=weights_backbone, progress=True)
backbone = _resnet_fpn_extractor(backbone, trainable_backbone_layers, norm_layer=nn.BatchNorm2d)
rpn_anchor_generator = _default_anchorgen()
rpn_head = RPNHead(backbone.out_channels, rpn_anchor_generator.num_anchors_per_location()[0], conv_depth=2)
box_head = FastRCNNConvFCHead(
(backbone.out_channels, 7, 7), [256, 256, 256, 256], [1024], norm_layer=nn.BatchNorm2d
)
mask_head = MaskRCNNHeads(backbone.out_channels, [256, 256, 256, 256], 1, norm_layer=nn.BatchNorm2d)
kwargs = {} # mapping
model = MaskRCNN(backbone, num_classes=num_classes, rpn_anchor_generator=rpn_anchor_generator, rpn_head=rpn_head, box_head=box_head, mask_head=mask_head, **kwargs,)
model.load_state_dict(torch.load('maskrcnn_resnet50_fpn_v2_coco-73cbd019.pth'))
model.eval()
x = torch.rand(1, 3, 800, 1344)
predictions = model(x)
torch.onnx.export(model, x, "end2end.onnx", opset_version = 11, input_names=['input'], output_names = ['boxes', 'labels', 'scores', 'masks'], dynamic_axes = {'input': {2: 'height', 3: 'width'}, 'boxes': { 1:'num'}, 'labels':{ 1: 'num'}, 'scores': {1: 'num'},'masks' : {1:'num'}})
- convert ONNX to OpenVINO IR
- I did in OpenVINO 2023.3 env so use old Model Optimizer, not NNCF
mo --input_model=".\end2end.onnx" --output_dir="./" --output="boxes,labels,scores,masks" --input="input" --input_shape="[1, 3, 800, 1344]"
- Copy
sync_benchmark.py
tosync_benchmark_GPU.py
and fix like this:
def fill_tensor_random(tensor):
seed = 42 ## ADD
# seed = int(datetime.datetime.timestamp(datetime.datetime.now())) ## ADD
print(f"Seed: {seed}") ## ADD
rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(seed))) ## CHANGE
tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype)
print(tensor.data[0,0,0], dtype) ## ADD
def main():
# device_name = 'CPU'
device_name = 'GPU' ## CHANGE
while time_point < time_point_to_finish or len(latencies) < niter:
for model_input in compiled_model.inputs: ## ADD
fill_tensor_random(ireq.get_tensor(model_input)) ## ADD
out = ireq.infer()
iter_end = perf_counter()
latencies.append((iter_end - time_point) * 1e3)
print(f'{len(latencies): 2d} : {(iter_end - time_point)*1000: .2f} ms, {out["scores"].data[0]}') ## ADD
time_point = iter_end
- Run
python sync_benchmark_GPU.py ./end2end.xml
- Copy
sync_benchmark_GPU.py
intosync_benchmark_GPU_fix.py
and fix like this:
def fill_tensor_random(tensor):
# seed = 42 ## CHANGE
seed = int(datetime.datetime.timestamp(datetime.datetime.now())) ## CHANGE
print(f"Seed: {seed}") ## ADD
- Run
python sync_benchmark_GPU_fix.py ./end2end.xml
- Compare log
Relevant log output
Original code (static random seed):
(.venv) PS C:\work\OpenVINO> python .\sync_benchmark_GPU.py .\end2end.xml
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
1 : 1414.72 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
2 : 1010.47 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
3 : 1024.23 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
4 : 1001.27 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
5 : 1000.56 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
6 : 1010.31 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
7 : 1010.87 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
8 : 1020.17 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
9 : 997.74 ms, 1.0
Seed: 42
[138.20845 158.01514 14.629294 ... 211.95033 122.911835 242.03699 ] float32
10 : 1001.57 ms, 1.0
[ INFO ] Count: 10 iterations
[ INFO ] Duration: 10491.92 ms
[ INFO ] Latency:
[ INFO ] Median: 1010.39 ms
[ INFO ] Average: 1049.19 ms
[ INFO ] Min: 997.74 ms
[ INFO ] Max: 1414.72 ms
[ INFO ] Throughput: 0.95 FPS
Fixed code (Randomized random seed)
(.venv) PS C:\work\OpenVINO> python .\sync_benchmark_GPU_fix.py .\end2end.xml
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
Seed: 1724891439
[216.66624 215.04857 186.01024 ... 243.51692 65.004 226.98805] float32
Seed: 1724891469
[116.17603 196.96167 117.602425 ... 183.78407 46.60835 29.839851] float32
1 : 29024.31 ms, 1.0
Seed: 1724891498
[ 45.37799 240.99669 239.74086 ... 170.49042 169.36159 43.48929] float32
2 : 26602.12 ms, 1.0
Seed: 1724891525
[ 81.0459 130.74333 88.43377 ... 103.64926 107.10424 83.10236] float32
3 : 26385.79 ms, 1.0
Seed: 1724891551
[112.18315 207.00949 193.35474 ... 139.64346 131.95663 24.062374] float32
4 : 24066.72 ms, 1.0
Seed: 1724891575
[177.98949 150.92938 197.91206 ... 140.75986 215.25012 246.18552] float32
5 : 26006.79 ms, 1.0
Seed: 1724891601
[ 64.49129 1.9813926 38.871136 ... 197.17485 56.3287
42.59408 ] float32
6 : 26609.57 ms, 1.0
Seed: 1724891628
[182.92015 135.3787 92.847496 ... 232.47296 204.9917 246.40154 ] float32
7 : 23220.02 ms, 1.0
Seed: 1724891651
[ 73.514114 36.317707 50.1486 ... 73.474625 178.2396 69.60647 ] float32
8 : 23810.37 ms, 1.0
Seed: 1724891675
[121.73839 159.2803 247.14323 ... 31.701612 60.62744 148.41168 ] float32
9 : 23489.89 ms, 1.0
Seed: 1724891698
[126.19236 74.2312 233.52321 ... 214.48938 221.90755 39.946312] float32
10 : 21840.82 ms, 1.0
[ INFO ] Count: 10 iterations
[ INFO ] Duration: 251056.41 ms
[ INFO ] Latency:
[ INFO ] Median: 25036.76 ms
[ INFO ] Average: 25105.64 ms
[ INFO ] Min: 21840.82 ms
[ INFO ] Max: 29024.31 ms
[ INFO ] Throughput: 0.04 FPS
Issue submission checklist
- [X] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.