DNN Bench

GitHub

DNN Bench is a library that lets you benchmark your deep learning models against various frameworks and backends, with a single command.

With DNN Bench you can answer questions like:

to which hardware should I deploy my model?
which backend should I use?
should I apply an optimisation technique, e.g. quantisation, before I deploy it?

The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.

Side note: Models are benchmarked within docker containers.

Example

Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.

Bert-CPU Resnet-CPU

See further analysis for more models benchmarked on different hardware.

Supported devices and backends

	PyTorch	TensorFlow	ONNX-Runtime	OpenVINO*	Nuphar*	CUDA*	TensorRT*
CPU	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:
GPU	:white_check_mark:	:white_check_mark:				:white_check_mark:	:white_check_mark:
ARM			:white_check_mark:

*Marked backends are executed within ONNX-Runtime framework.

Installation

Dependencies

Ubuntu

./install_dependencies.sh cpu

Replace cpu argument with gpu for nvidia-docker.

Other

Install docker.
Install nvidia-docker
Add yourself to docker group sudo usermod -aG docker $USER to run docker commands without sudo.

Deep learning backends

You can use pre-compiled images from dockerhub. They will be downloaded automatically when running ./bench_model.sh

Optional.
Prepare docker images for various deep learning backends locally.

./prepare_images.sh cpu

Replace cpu argument with gpu for gpu backends or arm for arm backends.

Usage

Benchmark an onnx model against different backends:

./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \
--tf --onnxruntime --openvino --pytorch --nuphar

Possible backends:

  --tf              (with --device=cpu or gpu)
  --onnxruntime     (with --device=cpu or arm)
  --openvino        (with --device=cpu)
  --pytorch         (with --device=cpu or gpu)
  --nuphar          (with --device=cpu)
  --ort-cuda        (with --device=gpu)
  --ort-tensorrt    (with --device=gpu)

Additional Parameters:

  --output   OUTPUT       Directory of benchmarking results. Default: ./results
  --repeat   REPEAT       Benchmark repeats. Default: 1000
  --number   NUMBER       Benchmark number. Default: 1
  --warmup   WARMUP       Benchmark warmup repeats that are discarded. Default: 100
  --device   DEVICE       Device backend: CPU or GPU or ARM. Default: CPU
  --quantize              Dynamic quantization in a corresponding backend.

Results

Results are stored by default to ./results directory. Each benchmarking result is stored in a json format.

{
   'model_path': '/models/efficientnet-lite4.onnx',
   'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json',
   'backend': 'onnxruntime',
   'backend_meta': 'openvino',
   'device': 'cpu',
   'number': 1,
   'repeat': 100,
   'warmup': 10,
   'size': 51946641,
   'input_size': [[1, 224, 224, 3]],
   'min': 0.038544699986232445,
   'max': 0.05930669998633675,
   'mean': 0.04293907555596282,
   'std': 0.0039751552053260125,
   'data': [0.04748649999964982,
            0.05760759999975562, ... ]
}

model_path: path to the input model
output_path: path to the results file
backend: deep learning backend used to produce the results
backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
device: gpu, cpu, arm, etc. where the model was benchmarked.
number: Number of inferences in a single experiment.
repeat: Number of repeated experiments.
warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
size: Size of the model in bytes.
min: Minimum time of an experiment run.
max: Maximum time of an experiment run.
mean: Mean time of an experiment run.
std: Standard deviation of an experiment run.
data: All measurements of the experiment runs.

Plotting

A simple plotting utility to generate quick plots is available in plot_results.py.

Dependencies:
pip install seaborn matplotlib pandas
Usage:
python vis/plot_results.py results_dir plots_dir

Limitations and known issues

--quantize flag not supported for --ort-cuda, --ort-tensorrt and --tf
Current version supports onnx models only. To convert models from other frameworks
follow these examples.
The following docker images for CPU execution utilize only half of the CPUs on Linux ec2 instances:
- onnxruntime with openvino,
- pytorch
onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.

Troubleshoot

If running tensorflow image fails due to onnx-tf conversion, re-build the image locally: docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest .
If you have permission errors to run docker, add yourself to docker group sudo usermod -aG docker $USER and re-login su - $USER.

DNN-bench
DNN-bench copied to clipboard

Metadata

DNN Bench

Example

Supported devices and backends

Installation

Dependencies

Ubuntu

Other

Deep learning backends

Usage

Results

Plotting

Limitations and known issues

Troubleshoot

← Metadata

Owner

Metadata

DNN-bench DNN-bench copied to clipboard

Metadata

DNN Bench

Example

Supported devices and backends

Installation

Dependencies

Ubuntu

Other

Deep learning backends

Usage

Results

Plotting

Limitations and known issues

Troubleshoot

← Metadata

Owner

Metadata

DNN-bench
DNN-bench copied to clipboard