pymatting
pymatting copied to clipboard
How to run Alpha Matting on GPU? I couldn't find anything on it. There is GPU support but how to enable that?
There only is GPU support for foreground estimation, but not for alpha matting itself. For foreground estimation, simply install PyOpenCL or CuPy and import the relevant foreground estimation method https://github.com/pymatting/pymatting/blob/b5f03a2c373cfe3ca6ba0f69a56e113ca3c5807d/tests/test_foreground.py#L13 and things should just work.
It would theoretically be possible to implement alpha matting for GPU, but there do not seem to be any good device-agnostic libraries yet, so one would have to duplicate most of the code for PyOpenCL and CuPy.
What is your preferred alpha matting method? Perhaps we could prioritize it (someday).
Estimation_alpha_knn with knn_laplacian. One more thing, Numba and Cupy can be enabled together right?
I need kdtree.py implementation in cupy
One more thing, Numba and Cupy can be enabled together right?
PyMatting uses CuPy together with Numba's CPU backend. There also is a CUDA backend for Numba, but last time I tried to install it, I had to reinstall my computer due to bad NVIDIA drivers.
I need kdtree.py implementation in cupy
I am not sure how well a k-d tree would work on the GPU. Usually, GPUs follow the SIMT computing model where you have 32 threads which all execute the same instruction on different data. But for k-d trees, you have a lot of branches, which leads to sequential execution. But perhaps it still works okay since GPUs these days have so many cores.
Anyway, we would probably implement everything with PyOpenCL before we implement anything with CuPy, because it would easily run on most GPUs, while CuPy only works on NVIDIA GPUs.
I translated the KD tree query to PyOpenCL.
https://gist.github.com/99991/08bcb341bd5a47170908d8c762d559c9
Surprisingly, it is faster than Numba when using a CPU device, but the speedup on GPU is not amazing, even if the GPU is very fast. My guess would be that this is due to branching, but I have not profiled it yet. I think it might be fruitful to evaluate other methods to find nearest neighbors to see whether they are more suitable for GPUs.