pymatting icon indicating copy to clipboard operation
pymatting copied to clipboard

How to run Alpha Matting on GPU? I couldn't find anything on it. There is GPU support but how to enable that?

Open aakash-chakraborty1995 opened this issue 2 years ago • 5 comments

aakash-chakraborty1995 avatar Nov 30 '22 12:11 aakash-chakraborty1995

There only is GPU support for foreground estimation, but not for alpha matting itself. For foreground estimation, simply install PyOpenCL or CuPy and import the relevant foreground estimation method https://github.com/pymatting/pymatting/blob/b5f03a2c373cfe3ca6ba0f69a56e113ca3c5807d/tests/test_foreground.py#L13 and things should just work.

It would theoretically be possible to implement alpha matting for GPU, but there do not seem to be any good device-agnostic libraries yet, so one would have to duplicate most of the code for PyOpenCL and CuPy.

What is your preferred alpha matting method? Perhaps we could prioritize it (someday).

99991 avatar Nov 30 '22 12:11 99991

Estimation_alpha_knn with knn_laplacian. One more thing, Numba and Cupy can be enabled together right?

aakash-chakraborty1995 avatar Dec 01 '22 12:12 aakash-chakraborty1995

I need kdtree.py implementation in cupy

aakash-chakraborty1995 avatar Dec 01 '22 14:12 aakash-chakraborty1995

One more thing, Numba and Cupy can be enabled together right?

PyMatting uses CuPy together with Numba's CPU backend. There also is a CUDA backend for Numba, but last time I tried to install it, I had to reinstall my computer due to bad NVIDIA drivers.

I need kdtree.py implementation in cupy

I am not sure how well a k-d tree would work on the GPU. Usually, GPUs follow the SIMT computing model where you have 32 threads which all execute the same instruction on different data. But for k-d trees, you have a lot of branches, which leads to sequential execution. But perhaps it still works okay since GPUs these days have so many cores.

Anyway, we would probably implement everything with PyOpenCL before we implement anything with CuPy, because it would easily run on most GPUs, while CuPy only works on NVIDIA GPUs.

99991 avatar Dec 01 '22 17:12 99991

I translated the KD tree query to PyOpenCL.

https://gist.github.com/99991/08bcb341bd5a47170908d8c762d559c9

Surprisingly, it is faster than Numba when using a CPU device, but the speedup on GPU is not amazing, even if the GPU is very fast. My guess would be that this is due to branching, but I have not profiled it yet. I think it might be fruitful to evaluate other methods to find nearest neighbors to see whether they are more suitable for GPUs.

99991 avatar Dec 05 '22 06:12 99991