Pillow icon indicating copy to clipboard operation
Pillow copied to clipboard

Does Pillow make use of GPU instances (CUDA)

Open ferozed opened this issue 10 years ago • 6 comments

Does Pillow utilize GPU instructions when running on GPU enabled machines?

ferozed avatar Nov 17 '15 21:11 ferozed

No, there's no support for CUDA. Pull requests are appreciated.

wiredfool avatar Nov 18 '15 09:11 wiredfool

Some folks from NVIDIA have expressed an interest in adding CUDA support, please watch for future discussions here and elsewhere.

They referenced this library they built with ByteDance which has an implementation of Pillow's resize for CUDA, added by popular demand.

Initial discussions will be about:

  • What does CUDA support in Pillow look like? We know the implementation will be significantly scoped and there are likely no "small pieces" (although the recently added #8329 may help).
  • Should it go in Pillow or in another library e.g. pillow-cuda. A separate library could facilitate something like pip install pillow[cuda].
  • Assuming this feature is developed by NVIDIA, how will the Pillow team maintain it? We don't want to add any significant maintenance burden for the core team to have to absorb. Maybe something like this helps.
  • We know we'll target Linux initially, but Pillow is cross-platform (Linux, macOS, Windows). CUDA support for Pillow will likely never run on macOS, since there are no NVIDIA GPUs there.

Similar to #1888, I am excited about the possibility of expanding Pillow's core features in ways that are both popular (in demand) and maintainable (in scope).

aclark4life avatar Apr 02 '25 13:04 aclark4life

There is a bit in the arrow spec about device memory, so it's possible that it's one way to leverage getting the data onto the GPU. PyArrow has CUDA/Numba integration: https://arrow.apache.org/docs/python/integration/cuda.html . On the other hand, if there's an arrow->GPU->arrow program that does image manipulation, we can support that now.

wiredfool avatar Apr 02 '25 21:04 wiredfool

Thinking about this a bit more:

  1. What's the use case? I understand the "everything faster" desire, but what's the minimal level of support that we'd need to be useful? I assume that load->resize->save loop, but what other GPU operations that would be required?
  2. What is the comparative advantage of doing this in Pillow? Assuming that we get the image loading/unloading from GPU memory down, what does Pillow bring to the table here? We'd be essentially rewriting all the core image manipulation routines in another c level language.
  3. If we do have a competitive advantage -- either because we've got a consistent api or whatever, we'll definitely need to ensure that the results are similar between GPU and CPU. This implies that we'll need to be able to keep the implementations in sync -- so we'd need to have some experience with CUDA, appropriate development environments with GPUs, and CI that can verify it.
  4. As we've seen with AVIF, our wheel sizes are getting bigger. This might be a consideration, depending on what we have to ship as a runtime.

wiredfool avatar Apr 04 '25 09:04 wiredfool

Also, FWIW, If we did anything on MacOS, it would probably have to be Metal, and that doesn't look insurmountable. Ref:

  • https://github.com/noppoMan/python-metal-benchmark
  • https://github.com/Al0den/metalgpu

wiredfool avatar Apr 04 '25 09:04 wiredfool

This might be related: https://thenewstack.io/nvidia-finally-adds-native-python-support-to-cuda/

  • https://nvidia.github.io/cuda-python/cuda-core/latest/
  • https://developer.nvidia.com/nvmath-python
  • https://cupy.dev/
  • https://developer.nvidia.com/how-to-cuda-python

Looks like cupy is basically numpy on cuda, so s/np\./cp\./ and it's accelerated. Which means that for things that are easily done in numpy, e.g, pixel level oriented changes, the arrow shuffle might work relatively well.

wiredfool avatar Apr 04 '25 15:04 wiredfool