flops-counter.pytorch Add benchmark mode support?

This tool is very useful, but there is still a lack of functionality in the benchmark.

layer or topology mode. layer mode: enable loop for each layer to estimate TFLOPS. topology mode: enable loop for model to estimate TFLOPS.
batch support. The model has different TFLOPS in different batch.
fps support. I can not find op and model's inference time to estimat fps.

so, could u add benchmark mode support? Just like:

code

flops, params = get_model_complexity_info(net, (batch, 3, 224, 224), mode='layer', loop=1000, as_strings=True, print_per_layer_stat=True)

output

(conv3d): Conv3d(x GMac, x% average MACs, x average TFLOPS, x average fps, x, x, kernel_size=[x, x, x], stride=(x, x, x), bias=True) or (conv3d): Conv3d(x GMac, x% average MACs, x average TFLOPS, x average ms, x, x, kernel_size=[x, x, x], stride=(x, x, x), bias=True)

Aug 02 '19 10:08 0400H

Let's assume we have a static model with fixed resolution (For an instance backbone trained on imagenet). On this resolution amount of flops in the whole model and per layer flops are constant. Averaging flops over loops per layer or overall will give us the same value as without loops.
Most of the models have static structure and always take images with the same resolution. For such models FLOPS with batch size X = FLOPS with batch size 1 * X and we don't need to perform expensive inference with large batch to compute this value.
There are no plans about hardware benchmarking. It's out of scope for now.

It somebody is going to evaluate models like Mask-RCNN then he should to average output of ptflops over evaluation dataset, because this model performs input-dependent amount of computations. With current interface it's not quite convenient, but possible using input_constructor parameter to feed real images into the evaluated model.

Aug 02 '19 14:08 sovrasov

Let's assume we have a static model with fixed resolution (For an instance backbone trained on imagenet). On this resolution amount of flops in the whole model and per layer flops are constant. Averaging flops over loops per layer or overall will give us the same value as without loops.

Most of the models have static structure and always take images with the same resolution. For such models FLOPS with batch size X = FLOPS with batch size 1 * X and we don't need to perform expensive inference with large batch to compute this value.

There are no plans about hardware benchmarking. It's out of scope for now.

It somebody is going to evaluate models like Mask-RCNN then he should to average output of ptflops over evaluation dataset, because this model performs input-dependent amount of computations. With current interface it's not quite convenient, but possible using input_constructor parameter to feed real images into the evaluated model.

Thank you for your explanation, but the second explanation is precisely what I can't agree with. Even different batches on the same hardware will get different TFLOPS, which is my original intention to make the above proposal.

for example:

Model Tfp ops: 8

batch 8:

total_time: 8s
one case average time: 1s
TFLOPS: 8

batch 32:

total_time: 30s
one case average time: 0.9375s
TFLOPS: 8.53

Obviously their TFLOPS are different.

Aug 03 '19 07:08 0400H

Ok, I see your point. You propose to measure the real performance based on the running time. As it stated in readme, ptflops aims to measure the theoretical amount of flops only. Now I have no plans to implement hardware benchmarking. Of coarse it's an interesting topic. I'll take a look existing profiling features of Pytorch.

Aug 05 '19 07:08 sovrasov

flops-counter.pytorch flops-counter.pytorch copied to clipboard

Add benchmark mode support?

flops-counter.pytorch
flops-counter.pytorch copied to clipboard