DNN-bench
DNN-bench copied to clipboard
A library that lets you easily increase efficiency of your deep learning models with no loss of accuracy.
DNN Bench
DNN Bench is a library that lets you benchmark your deep learning models against various frameworks and backends, with a single command.
With DNN Bench you can answer questions like:
- to which hardware should I deploy my model?
- which backend should I use?
- should I apply an optimisation technique, e.g. quantisation, before I deploy it?
The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.
Side note: Models are benchmarked within docker containers.
Example
Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.

See further analysis for more models benchmarked on different hardware.
Supported devices and backends
| PyTorch | TensorFlow | ONNX-Runtime | OpenVINO* | Nuphar* | CUDA* | TensorRT* | |
|---|---|---|---|---|---|---|---|
| CPU | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||
| GPU | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |||
| ARM | :white_check_mark: |
*Marked backends are executed within ONNX-Runtime framework.
Installation
Dependencies
Ubuntu
./install_dependencies.sh cpu
Replace cpu argument with gpu for nvidia-docker.
Other
- Install docker.
- Install nvidia-docker
- Add yourself to docker group
sudo usermod -aG docker $USERto run docker commands without sudo.
Deep learning backends
You can use pre-compiled images from dockerhub.
They will be downloaded automatically when running ./bench_model.sh
Optional.
Prepare docker images for various deep learning backends locally.
./prepare_images.sh cpu
Replace cpu argument with gpu for gpu backends or arm for arm backends.
Usage
Benchmark an onnx model against different backends:
./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \
--tf --onnxruntime --openvino --pytorch --nuphar
Possible backends:
--tf (with --device=cpu or gpu)
--onnxruntime (with --device=cpu or arm)
--openvino (with --device=cpu)
--pytorch (with --device=cpu or gpu)
--nuphar (with --device=cpu)
--ort-cuda (with --device=gpu)
--ort-tensorrt (with --device=gpu)
Additional Parameters:
--output OUTPUT Directory of benchmarking results. Default: ./results
--repeat REPEAT Benchmark repeats. Default: 1000
--number NUMBER Benchmark number. Default: 1
--warmup WARMUP Benchmark warmup repeats that are discarded. Default: 100
--device DEVICE Device backend: CPU or GPU or ARM. Default: CPU
--quantize Dynamic quantization in a corresponding backend.
Results
Results are stored by default to ./results directory. Each benchmarking result
is stored in a json format.
{
'model_path': '/models/efficientnet-lite4.onnx',
'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json',
'backend': 'onnxruntime',
'backend_meta': 'openvino',
'device': 'cpu',
'number': 1,
'repeat': 100,
'warmup': 10,
'size': 51946641,
'input_size': [[1, 224, 224, 3]],
'min': 0.038544699986232445,
'max': 0.05930669998633675,
'mean': 0.04293907555596282,
'std': 0.0039751552053260125,
'data': [0.04748649999964982,
0.05760759999975562, ... ]
}
- model_path: path to the input model
- output_path: path to the results file
- backend: deep learning backend used to produce the results
- backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
- device: gpu, cpu, arm, etc. where the model was benchmarked.
- number: Number of inferences in a single experiment.
- repeat: Number of repeated experiments.
- warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
- size: Size of the model in bytes.
- min: Minimum time of an experiment run.
- max: Maximum time of an experiment run.
- mean: Mean time of an experiment run.
- std: Standard deviation of an experiment run.
- data: All measurements of the experiment runs.
Plotting
A simple plotting utility to generate quick plots is available in plot_results.py.
- Dependencies:
pip install seaborn matplotlib pandas - Usage:
python vis/plot_results.py results_dir plots_dir
Limitations and known issues
--quantizeflag not supported for--ort-cuda,--ort-tensorrtand--tf- Current version supports onnx models only. To convert models from other frameworks
follow these examples. - The following docker images for CPU execution utilize only half of the CPUs on Linux
ec2 instances:
- onnxruntime with openvino,
- pytorch
- onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.
Troubleshoot
- If running tensorflow image fails due to onnx-tf conversion,
re-build the image locally:
docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest . - If you have permission errors to run docker, add yourself to docker group
sudo usermod -aG docker $USERand re-loginsu - $USER.