raisr
raisr copied to clipboard
how can we make it GPU optimized
if someone is interested we can work together if a GPU implementation is not already there.
Thank you
I don't have the programming experience necessary, but I'd love to see that exist.
Yea those nested for loops just screams "parallelize me!". Looks like there are python hooks for nvidia cuda. What are you looking to thread? The image processing should be a breeze - either write our own since we'll have to bust the pillow api or find existing threaded variants for a lot of these convolutions/linear math.
OpenCV automatically utilizes the GPU with OpenCL for image processing if available (it can also use CPU OpenCL runtimes to improve its performance). There are already parts in the code using OpenCV; just rewriting the image preprocessing stuff to use OpenCV would most likely bring a huge performance boost. Additionally some parallelization of for loops would probably also help a bit (while "Processing image … of … (train/…)" it uses only one cpu core at the moment).
Did anyone manage to implement this in the end? I noticed that OpenCV is a requirement and I'm not sure whether you were discussing an older version above.
I wrote an implementation of the inference half in a OpenGL compute shader. Performance is okay but could probably be improved; something like 8ms to upscale from 1080p to 4K on a GTX 1070. I stored the filterbank in a texture, but it's plausible that you could get better perf by storing it in a UBO and tiling in shared memory.