flops-counter.pytorch icon indicating copy to clipboard operation
flops-counter.pytorch copied to clipboard

Add benchmark mode support?

Open 0400H opened this issue 5 years ago • 3 comments

This tool is very useful, but there is still a lack of functionality in the benchmark.

  1. layer or topology mode. layer mode: enable loop for each layer to estimate TFLOPS. topology mode: enable loop for model to estimate TFLOPS.
  2. batch support. The model has different TFLOPS in different batch.
  3. fps support. I can not find op and model's inference time to estimat fps.

so, could u add benchmark mode support? Just like:

code

flops, params = get_model_complexity_info(net, (batch, 3, 224, 224), mode='layer', loop=1000, as_strings=True, print_per_layer_stat=True)

output

(conv3d): Conv3d(x GMac, x% average MACs, x average TFLOPS, x average fps, x, x, kernel_size=[x, x, x], stride=(x, x, x), bias=True) or (conv3d): Conv3d(x GMac, x% average MACs, x average TFLOPS, x average ms, x, x, kernel_size=[x, x, x], stride=(x, x, x), bias=True)

0400H avatar Aug 02 '19 10:08 0400H

  1. Let's assume we have a static model with fixed resolution (For an instance backbone trained on imagenet). On this resolution amount of flops in the whole model and per layer flops are constant. Averaging flops over loops per layer or overall will give us the same value as without loops.
  2. Most of the models have static structure and always take images with the same resolution. For such models FLOPS with batch size X = FLOPS with batch size 1 * X and we don't need to perform expensive inference with large batch to compute this value.
  3. There are no plans about hardware benchmarking. It's out of scope for now.

It somebody is going to evaluate models like Mask-RCNN then he should to average output of ptflops over evaluation dataset, because this model performs input-dependent amount of computations. With current interface it's not quite convenient, but possible using input_constructor parameter to feed real images into the evaluated model.

sovrasov avatar Aug 02 '19 14:08 sovrasov

  1. Let's assume we have a static model with fixed resolution (For an instance backbone trained on imagenet). On this resolution amount of flops in the whole model and per layer flops are constant. Averaging flops over loops per layer or overall will give us the same value as without loops.
  2. Most of the models have static structure and always take images with the same resolution. For such models FLOPS with batch size X = FLOPS with batch size 1 * X and we don't need to perform expensive inference with large batch to compute this value.
  3. There are no plans about hardware benchmarking. It's out of scope for now.

It somebody is going to evaluate models like Mask-RCNN then he should to average output of ptflops over evaluation dataset, because this model performs input-dependent amount of computations. With current interface it's not quite convenient, but possible using input_constructor parameter to feed real images into the evaluated model.

Thank you for your explanation, but the second explanation is precisely what I can't agree with. Even different batches on the same hardware will get different TFLOPS, which is my original intention to make the above proposal.

for example:

Model Tfp ops: 8

batch 8:

  • total_time: 8s
  • one case average time: 1s
  • TFLOPS: 8

batch 32:

  • total_time: 30s
  • one case average time: 0.9375s
  • TFLOPS: 8.53

Obviously their TFLOPS are different.

0400H avatar Aug 03 '19 07:08 0400H

Ok, I see your point. You propose to measure the real performance based on the running time. As it stated in readme, ptflops aims to measure the theoretical amount of flops only. Now I have no plans to implement hardware benchmarking. Of coarse it's an interesting topic. I'll take a look existing profiling features of Pytorch.

sovrasov avatar Aug 05 '19 07:08 sovrasov