ECCV2022-RIFE icon indicating copy to clipboard operation
ECCV2022-RIFE copied to clipboard

Training with mixed-precision

Open VvvvvGH opened this issue 3 years ago • 4 comments

Will you consider train with mixed-precision? It can speedup inference and lower vram usage.

VvvvvGH avatar Mar 10 '21 03:03 VvvvvGH

I currently have many other experiments to improve visual effects. Improving efficiency is currently not a high priority work. Is there any data to illustrate the speedup of RIFE at mixed precision?

hzwer avatar Mar 10 '21 08:03 hzwer

GPU:RTX3090 CPU: 9900K

Ran Vimeo90K.py on Vimeo test set. Use Model version 2.4 Simply modify code to use half precision.

        self.flownet = self.flownet.half()
        self.contextnet = self.contextnet.half()
        self.fusionnet = self.fusionnet.half()

Original:

Avg PSNR: 34.08642482478815 SSIM: 0.971693217754364 Inference time: 0.015075199002629547
Total inference time: 57.014402627944946

Using half precision:

Avg PSNR: 34.01964114699109 SSIM: 0.9714009165763855 Inference time: 0.013368013735992577
Total inference time: 50.557827949523926

On 1080p video VRAM usage reduce from 6347MB to 3450MB

Original:

900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:54<00:00,  7.84it/s]

Using half precision:

900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:41<00:00,  8.83it/s]

VvvvvGH avatar Mar 10 '21 13:03 VvvvvGH

OK. At present, it seems that there is indeed a need to reduce the memory overhead, and I can design some new models.

hzwer avatar Mar 11 '21 05:03 hzwer

Hi, thanks for testing the fp16 speed and memory consumption. I believe we may see a more significant speedup effect when using GPUs that support FP16 much better, like T4, V100 For GPUs like 1080 Ti and 3090, the theoretical performance for half precision did improve a lot compared with full precision. 3090: https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622 T4: https://www.techpowerup.com/gpu-specs/tesla-t4.c3316

a1600012888 avatar Mar 11 '21 08:03 a1600012888