Real-ESRGAN
Real-ESRGAN copied to clipboard
Slow render speeds
Hi, I am running this command to upscale 720p anime: CUDA_VISIBLE_DEVICES=0 python inference_realesrgan_video.py -i inputs/video/onepiece_demo.mp4 -n realesr-animevideov3 -s 2 --suffix outx2 --num_process_per_gpu 2 The model generates at 3.5 frames per second with a 3080ti. Is this normal? The github says the anime video v3 model runs at 22.6 fps at 720p. I am running everything in Anaconda, does this reduce the performance? How can I make it faster?
following
@winstonyue any luck? New user here, attempting to verify GPU usage over CPU. @MaxTran96
So I noticed that the frames per second counter isn't showing the total, I think it's displaying frames being generated per process on the gpu. When I set the num_process_per_gpu to 4 or higher, the fps lowered slightly, but my overall render time was reduced significantly. Overall frames per second were much higher, maybe around 20 or something. My gpu usage was nearing 100 percent as well.
As for upscaling 1920x1080 videos 4x (then downscale 2x to produce 4k videos), there is a lot of time in converting output fp16 to fp32 and quantizing to uint8 on CPU. By moving this to GPU in utils.py
:
output_img = (output_img.data.squeeze() * 255.0).clamp_(0, 255).byte().cpu().numpy()
I can double to triple the processing frame rate. But it is still slow (from <1 fps to 2.2 fps on 3080). num_process_per_gpu
also does not utilize GPU efficiently. I think the correct way to go is to batch the inference. But I haven't implemented it yet.
An experimental implementation of this is available at https://github.com/eliphatfs/Real-ESRGAN/blob/master/inference_realesrgan_video_fast.py Reduced IO overhead and implemented batch inference. On my 3080 LP it is able to boost 1080p to 4k upscaling from 0.8 fps to 4 fps with the default set batch size 4 (5x speed up!). The correctness is not yet verified systematically, but it looks good.
@eliphatfs in order to implement your '_fast.py` script, may I simply drag and drop such into the vanilla Real-ESRGAN repo? I've attempted such and am running into this error:
edit: maybe I should just download and run from your repo? Thanks.
I did a few more tweaks and the frame rate is now 4.1 to 4.2 fps on my 3080 LP. For my testing video there is still 10% time doing IO, that can be made asynchronous with the main thread. But I think that is fair.
@eliphatfs in order to implement your '_fast.py` script, may I simply drag and drop such into the vanilla Real-ESRGAN repo? I've attempted such and am running into this error:
edit: maybe I should just download and run from your repo? Thanks.
Sorry. I added some instrumentation for profiling the code to analyze where the time is spent. I have removed them.
P.S. The script does not yet support alpha channels or gray-scale. Only RGB videos are supported (I think this should be the case of 99% modern videos).
extract_frame_first
may be also problematic since PNG files may have alpha channels. Maybe I will work on RGBA/L sometime later, or maybe I will just drop these parameters.
Face enhance is also not supported since it uses a different model.
An experimental implementation of this is available at https://github.com/eliphatfs/Real-ESRGAN/blob/master/inference_realesrgan_video_fast.py Reduced IO overhead and implemented batch inference. On my 3080 LP it is able to boost 1080p to 4k upscaling from 0.8 fps to 4 fps with the default set batch size 4 (5x speed up!). The correctness is not yet verified systematically, but it looks good.
Awesome patch. I test it with my 4070ti, realesr-general-x4v3 model, from 2.12 fps to 3.9 fps. I have yet to set num_process_per_gpu or else.
An experimental implementation of this is available at https://github.com/eliphatfs/Real-ESRGAN/blob/master/inference_realesrgan_video_fast.py Reduced IO overhead and implemented batch inference. On my 3080 LP it is able to boost 1080p to 4k upscaling from 0.8 fps to 4 fps with the default set batch size 4 (5x speed up!). The correctness is not yet verified systematically, but it looks good.
Could you provide same for "inference_realesrgan.py" to accelerate? I asked GPT 4.0 for 1 hr based on your file for inference_realesrgan_video_fast.py but he failed.
To completely extract the performance of GPU, I have created a GitHub repo (https://github.com/Kiteretsu77/FAST_Anime_VSR). I implemented it in TensorRT version and utilized a frame division algorithm (self-designed) to accelerate it (with a video redundancy jump mechanism [similar to video compression Inter-Prediction] and a momentum mechanism). Plus, I use FFMPEG to decode a smaller FPS for faster processing but the quality drop is extremely negligible. Plus, I use multiprocessing and multithreading to completely consume all computation resources. Feel free to look at this slide [https://docs.google.com/presentation/d/1Gxux9MdWxwpnT4nDZln8Ip_MeqalrkBesX34FVupm2A/edit#slide=id.p] for the implementation and algorithm I have used. In my 3060Ti Desktop version, my code can process faster than the Real-Time Anime videos for 480p video input.