mmaction2 icon indicating copy to clipboard operation
mmaction2 copied to clipboard

Comparisons for Recognizers

Open irvingzhang0512 opened this issue 5 years ago • 5 comments

Description

MMAction2 provide a large number of recognizers. In order to choose the right model for our applications, maybe we need to compare all models in one table. But I don't know how to do this.

I'm open to all suggestions.

Inference time statistics

  • Inference time is my priority, so here is a table for this.
  • Related codes could be found Here.
model_name Tesla V100-PCIE(32f, 16f, 8f) GTX 1080ti(32f, 16f, 8f) Jetson AGX Xavier(32f, 16f, 8f)
TSN_r50 31/17/10 52/26/14 258/134/80
TSM_r50 34/19/13 59/30/16 278/145/86
TSM_MobileNetV2 10/10/10 23/12/7 81/41/23
TIN_r50 72/30/30 141/52/24 561/218/104
TANet 37/21/19 64/33/19 429/165/100
I3D_r50 27/23/21 21/14/11 128/68/37
2Plus1d_r34 61/41/32 77/41/27 539/278/146
CSN_r152 172/169/169 115/92/88 584/303/163
SlowFast_r50 34/28/28 28/18/14 150/81/45
SlowOnly_r50 58/38/28 79/41/23 576/301/160
X3D 95/93/92 84/52/49 415/212/112

Notes:

  • The unit of inference time is millisecond(ms).
  • 32f, 16f, 8f means number of frames for model inputs.
    • Default input shape for 2D Recognizers is (1, num_frames, 3, 224, 224)
    • Default input shape for 3D Recognizers is (1, 1, 3, num_frames, 224, 224)
  • TPN models and C3D models are not involved yet.
    • TPN models are not valid for 32 frames.
    • C3D models only support input shape (1, 1, 3, 16, 112, 112).

TODO

  • [x] Inference time for PyTorch models with default config.
  • [ ] Inference time for PyTorch/ONNX/TensorRT model with various configs
    • PyTorch models support fp16, fuse_conv_bn, cudnn, etc.
    • TensorRT models suppport fp16/int8.
  • [ ] Detailed information for each model, such as FLOPs, gpu memory, training/test results, etc.
  • [ ] Inference time for input preprocessing.

irvingzhang0512 avatar Mar 31 '21 03:03 irvingzhang0512

Thanks, this is great!

  • For TPN and C3D, is there a way to put up a setting that is as fair as possible? For example, if it supports 8 or 16 frames only, then you can forward 32 frames in one batch with batch size 4 and 2, respectively.
  • Generally there is a speed/accuracy trade-off. Reporting their accuracies on a common test set (e.g. pick 4000 vids from the test set of K400) would be helpful to evaluate any performance degradations for different precisions.

innerlee avatar Mar 31 '21 06:03 innerlee

I went through the codes of C3D. It turns out that we cannot modify C3D config to support other input shapes. I haven't study the codes of TPN, may take a look in April.

Maybe a table like this

model type model name sampling strategy v100/1080ti/agx latency(ms) kinetics400 accuracy sthv2 accuracy comments
PyTorch TSM-R50 1x1x8 13/16/86 70.24 / 89.56 57.86 / 61.12 /

irvingzhang0512 avatar Mar 31 '21 10:03 irvingzhang0512

Notes: We will support Support auto_fp16 using torch.cuda.amp in the future https://github.com/open-mmlab/mmcv/pull/791

dreamerlin avatar Mar 31 '21 11:03 dreamerlin

Not really. Most info in the above table is already present in the modelzoo, or can be added to the tables (i.e. column v100/1080ti/agx latency) in modelzoo.

I think the most valuable part is the speed/accuracy benchmark for different precisions.

innerlee avatar Mar 31 '21 13:03 innerlee

Something like that in GluonCV? bokeh_plot

irvingzhang0512 avatar Mar 31 '21 14:03 irvingzhang0512