mmaction2 Comparisons for Recognizers

Description

MMAction2 provide a large number of recognizers. In order to choose the right model for our applications, maybe we need to compare all models in one table. But I don't know how to do this.

I'm open to all suggestions.

Inference time statistics

Inference time is my priority, so here is a table for this.
Related codes could be found Here.

model_name	Tesla V100-PCIE(32f, 16f, 8f)	GTX 1080ti(32f, 16f, 8f)	Jetson AGX Xavier(32f, 16f, 8f)
TSN_r50	31/17/10	52/26/14	258/134/80
TSM_r50	34/19/13	59/30/16	278/145/86
TSM_MobileNetV2	10/10/10	23/12/7	81/41/23
TIN_r50	72/30/30	141/52/24	561/218/104
TANet	37/21/19	64/33/19	429/165/100
I3D_r50	27/23/21	21/14/11	128/68/37
2Plus1d_r34	61/41/32	77/41/27	539/278/146
CSN_r152	172/169/169	115/92/88	584/303/163
SlowFast_r50	34/28/28	28/18/14	150/81/45
SlowOnly_r50	58/38/28	79/41/23	576/301/160
X3D	95/93/92	84/52/49	415/212/112

Notes:

The unit of inference time is millisecond(ms).
32f, 16f, 8f means number of frames for model inputs.
- Default input shape for 2D Recognizers is (1, num_frames, 3, 224, 224)
- Default input shape for 3D Recognizers is (1, 1, 3, num_frames, 224, 224)
TPN models and C3D models are not involved yet.
- TPN models are not valid for 32 frames.
- C3D models only support input shape (1, 1, 3, 16, 112, 112).

TODO

[x] Inference time for PyTorch models with default config.
[ ] Inference time for PyTorch/ONNX/TensorRT model with various configs
- PyTorch models support fp16, fuse_conv_bn, cudnn, etc.
- TensorRT models suppport fp16/int8.
[ ] Detailed information for each model, such as FLOPs, gpu memory, training/test results, etc.
[ ] Inference time for input preprocessing.

Mar 31 '21 03:03 irvingzhang0512

Thanks, this is great!

For TPN and C3D, is there a way to put up a setting that is as fair as possible? For example, if it supports 8 or 16 frames only, then you can forward 32 frames in one batch with batch size 4 and 2, respectively.
Generally there is a speed/accuracy trade-off. Reporting their accuracies on a common test set (e.g. pick 4000 vids from the test set of K400) would be helpful to evaluate any performance degradations for different precisions.

Mar 31 '21 06:03 innerlee

I went through the codes of C3D. It turns out that we cannot modify C3D config to support other input shapes. I haven't study the codes of TPN, may take a look in April.

Maybe a table like this

model type	model name	sampling strategy	v100/1080ti/agx latency(ms)	kinetics400 accuracy	sthv2 accuracy	comments
PyTorch	TSM-R50	1x1x8	13/16/86	70.24 / 89.56	57.86 / 61.12	/

Mar 31 '21 10:03 irvingzhang0512

Notes: We will support Support auto_fp16 using torch.cuda.amp in the future https://github.com/open-mmlab/mmcv/pull/791

Mar 31 '21 11:03 dreamerlin

Not really. Most info in the above table is already present in the modelzoo, or can be added to the tables (i.e. column v100/1080ti/agx latency) in modelzoo.

I think the most valuable part is the speed/accuracy benchmark for different precisions.

Mar 31 '21 13:03 innerlee

Something like that in GluonCV? bokeh_plot

Mar 31 '21 14:03 irvingzhang0512