image-super-resolution icon indicating copy to clipboard operation
image-super-resolution copied to clipboard

Benchmarking

Open alxcnwy opened this issue 6 years ago • 4 comments

Can you please add some performance numbers to the main project docs indicating inference latency running some common hardware options e.g. AWS p2, GCP gpu instance, CPU inference, Raspbery pi, etc.

alxcnwy avatar Mar 29 '19 07:03 alxcnwy

I'm curious if it's fast enough to run on video. I understand there may be some cross-frame artefacts but first wondering if latency might be a deal-breaker

alxcnwy avatar Mar 29 '19 07:03 alxcnwy

Hi, currently we do not plan to provide benchmarks on any hardware. However:

  • latency is a known issue with image super-resolution in general. The computational cost is very high due to large input size and high number of filter maps. For this reason we are having a look into compressing the architecture and make inference faster. Should we succeed, we will perform and publish some benchmarks.
  • this project is meant to be an experimentation platform for image super-resolution, so in a future version, we will take into account some components to facilitate benchmarking.

Unfortunately most papers, including the ones implemented here, provide little to no benchmarking, so the literature is also not very helpful. My personal opinion is that as of now, applying image super-resolution to videos is a doable albeit extremely costly task (inference on a large 1000x1000 RGB image can take up to 300 seconds on modern CPU).

cfrancesco avatar Mar 29 '19 10:03 cfrancesco

Thanks for your reply @cfrancesco. I think benchmarks would be useful so please reconsider...

alxcnwy avatar Mar 30 '19 16:03 alxcnwy

@alexcnwy I've been doing some experimentation for video inference. I wrote a quick script that takes an input video, pipes the frames one at a time through ISR and saves the output using ffmpeg. On my RTX 2060, I get around 0.5-2 FPS depending on the input video resolution and model used. Cross-frame artefacts don't seem to be a major problem, although quality is worse than models specifically designed for video inference (such as TecoGAN).

ejaszewski avatar Sep 06 '19 22:09 ejaszewski