cucim icon indicating copy to clipboard operation
cucim copied to clipboard

[FEA] cuCIM as a scikit-image backend

Open Schefflera-Arboricola opened this issue 10 months ago • 10 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I wish I could use cuCIM to do [...]

Currently, in scikit-image we are developing a dispatching mechanism that would allow function calls to be rerouted to different backend packages(like cucim). This means users would be able to seamlessly use cuCIM as a backend for scikit-image, getting significant speed improvements without rewriting much of their code.

So, cuCIM as a backend would look something like:

import os
from sklearn.metrics import mean_squared_error
import cupy as cp

os.environ["SKIMAGE_BACKENDS"] = "cucim"

img0 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)
img1 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)

print(mean_squared_error(img0, img1))  # Uses cuCIM's implementation of mean_squared_error, not scikit-image's

This setup allows users to benefit from GPU acceleration while keeping their familiar scikit-image API. We also plan on providing documentation and testing support for backends.

Also, you can actually run the above code, if you have a GPU setup, and then you would have to:

  • create a local development branch and add the entry-points and the interface and info, as described in the next section
  • install scikit-image dispatching development branch --> pip install git+https://github.com/Schefflera-Arboricola/scikit-image.git@patch-1 (make sure to uninstall any other scikit-image versions)
  • successfully run the above code

I don't have GPUs and I tried building cucim from source in google colab(with runtime type as GPU), but I couldn't.


Describe the solution you'd like A clear and concise description of what you want to happen.

To make cuCIM into a backend package I think we will only need to add two entry-points in the project's pyproject.toml(here), and the objects those entry-points are referring to:

[project.entry-points.skimage_backends]
cucim = "cucim.skimage:backend_interface"

[project.entry-points.skimage_backend_infos]
cucim = "cucim.skimage:info"
  • info: Currently, it tells scikit-image which functions are supported by cuCIM. It is a function that returns a BackendInformation object, which has an attribute named supported_functions whose value is a list of function names as strings(in this format: "public_module_name:function_name") supported by cuCIM(the backend). Also, BackendInformation is defined in scikit-image. And in future we plan to use the BackendInformation to let backend provide more additional information about itself and it's supported functions.

  • backend_interface: it's a namespace containing two functions:

    • can_has: Quickly checks if cuCIM can handle a given function call. It takes in the function name and the args and kwargs passed in by the user and does an inexpensive, initial check about weather cuCIM can handle these args or not(like checking the type of the args, etc.) and based on that return True or False.
    • get_implementation: Returns the actual cuCIM function to execute. If can_has returns True then the get_implementation is called, and it returns the function callable which is then called and the backend implementation is run.

Here is a dummy backend for your reference: https://github.com/Schefflera-Arboricola/skimage-j4f


Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

This backend mechanism is still a work in progress and we would really like to know the interest of the cucim community in this and any feedback you all might have on how this dispatching machinery could be improved to better accommodate the needs of backends, such as cucim.

If you are interested, you can also consider joining scikit-image's dispatching meetings:

  • meeting link : https://meet.jit.si/scikit-image-dispatching (Wednesdays 8-9 am UTC)
  • calendar invite link : https://calendar.app.google/HnMpJCUP591xzTAn7
  • meeting notes - https://hackmd.io/@betatim/SJlpIwQgyl/edit

Additional context Add any other context, code examples, or references to existing implementations about the feature request here.

  • initial implementation - PR https://github.com/scikit-image/scikit-image/pull/7520

  • more dispatching developments going on at - PR https://github.com/betatim/scikit-image/pull/1

  • scikit-image dispatching summary diagram: Image

https://drive.google.com/file/d/1xHLs6rK1P1XGt83ueL-DUbPO-dF0ZKFQ/view?usp=drive_link

  • inspired by NetworkX's entry-point based dispatching mechanism : https://networkx.org/documentation/latest/reference/backends.html

Looking forward to your feedback!

Thank you :)

Schefflera-Arboricola avatar Feb 11 '25 17:02 Schefflera-Arboricola

Hi @Schefflera-Arboricola, it's great to see this making progress on the scikit-image side! I remember we started looking at it at EuroSciPy and had seen that there had been some updates from you and Tim in the scikit-image repo, but was a bit out of date on the latest progress.

I am interested in helping implement a backend for cuCIM. From March I may be able to have a bit more dedicated time for it, but we can try to make some initial progress before then. Let me review the information you have provided and post any follow-up questions here.

The current meeting time is not feasible for me (3 AM on US east coast), but I am fine to collaborate asynchronously here initially and we can schedule a separate meeting if needed later on to discuss in person.

grlee77 avatar Feb 11 '25 18:02 grlee77

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable? https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Also, FYI @Schefflera-Arboricola , for this project we have a directory layout that isn't very common. The python pyproject.toml relevant to defining the endpoints is here: https://github.com/rapidsai/cucim/blob/branch-25.04/python/cucim/pyproject.toml

And the following subfolder has an equivalent layout to scikit-image itself https://github.com/rapidsai/cucim/tree/branch-25.04/python/cucim/src/cucim/skimage

Because we track the upstream scikit-image API, it should be straightforward to use cuCIM as a backend. We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

grlee77 avatar Feb 14 '25 19:02 grlee77

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable? https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Answered , here - https://github.com/scikit-image/scikit-image/pull/7520/files#r1956593052

We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

In the dispatching discussions so far that we have been having in scikit-image, we have been assuming that the user would be passing the array type that the backend supports and we(scikit-image) or the backend(s) will not be(or should not -- because it's expensive and/or not feasible for some array types) doing any array conversions. And can_has should be used by the backends to do an initial type-check or any other inexpensive check(s) on the passed in args. But, if you think that the array conversion(from numpy to cupy and cupy to numpy) would be a useful thing for the users then we can start talking more about it.

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks!

Schefflera-Arboricola avatar Feb 14 '25 20:02 Schefflera-Arboricola

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks

Right, I would not want to put any conversion in can_has. The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed. Currently for the cuCIM functions as-is they only accept CuPy array inputs.

grlee77 avatar Feb 18 '25 20:02 grlee77

Right, I would not want to put any conversion in can_has.

ok, but, do you think having array conversions as part of the dispatching mechanism would be a helpful thing to have?

When a numpy array is passed then we can convert and then cache that converted cupy array. And then for the next function(that will be dispatched in the same runtime) that image(or numpy array) would not need to be converted again, we can use the cached cupy array. But, will that be a good thing to do? Also, we will have to perform the conversion again at the end for the returned array (from cupy to numpy), if we want the input array type and the output array type to be same(and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type). Also, we can make this conversion step optional if you think this kind of "conversion and caching" will be beneficial for some types of arrays. LMKWYT.

The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed.

I'm not sure if I understand the second part of your question(i.e. ...or if we want to provide a backend the would allow round-trip host/device transfer as needed.) correctly. I think you mean-- if a numpy array is passed then cucim would transfer the call back to scikit-image's native implementation, right?

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg). (fyi, the backend priority and this falling-back is implemented in PR https://github.com/betatim/scikit-image/pull/1 and not in PR https://github.com/scikit-image/scikit-image/pull/7520)

Currently for the cuCIM functions as-is they only accept CuPy array inputs.

I think a check like this in can_has would be good then -- hasattr(arr, "__module__") and arr.__module__.startswith("cupy")

Schefflera-Arboricola avatar Feb 19 '25 04:02 Schefflera-Arboricola

| and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type

Definitely agree with @stefanv that it is a much cleaner user experience if the returned array type matches the user's provided array type.

For example, it would be easy to quickly accept any array implementing Numba's CUDA array interface via an inexpensive zero-copy conversion (i.e. the existing data pointer is reused without making a copy)

# zero-copy conversion of GPU array to a CUDA array
if hasattr(arr, "__cuda_array_interface__"):
    arr = cp.asarray(arr)

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

I do think it would make sense to potentially support NumPy arrays specifically, though, as that is the native array type used by scikit-image.

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg).

Yes, I understood that part. The question is how to handle logic to allow can_has to return True for NumPy inputs. Here are a couple of options:

  • cuCIM could implement some decorator that could be applied to all cucim.skimage functions so they would call cupy.asarray on NumPy inputs before calling the wrapped function and then cupy.asnumpy on the output array (if the inputs were numpy arrays). This seems like a not very elegant approach, though.
  • Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

On the cuCIM side, those two functions would just be (minus adding some potentially error checking):

def numpy_to_native_array(arr):
    return cp.asarray(arr)

def native_array_to_numpy(arr):
    return cp.asnumpy(arr)

grlee77 avatar Feb 19 '25 22:02 grlee77

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

Y'all have thought much more about this than I have, so feel free to disregard my 0.01c.

It sounds like forcing conversion onto the backend may make it perform some non-optimal decisions. E.g., it may be better for a certain backend to provide sparse results, instead of dense, but now it'd be forced to produce NumPy arrays.

If you do not return the same type as input, or a type that can be converted to NumPy Array via asarray, it prohibits building pipelines, since the subsequent function invocation will not be able to handle the output of the previous step. This makes switching the backend, and comparing implementations, impossible.

I think the concepts we want to enforce, therefore, may be:

  • The result of any operation must be the same across backends (within pragmatic bounds), regardless of which containers are used.
  • We must be able to build pipelines, like we currently can, by chaining functions, even when switching backends.

stefanv avatar Feb 19 '25 22:02 stefanv

  • Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

Does it make sense to give the backend selector parameters? Here, it sounds like you may have a highly efficient backend selector: "only accept for dispatch if dealing with cupy arrays", or a more aggressive, but less efficient backend selector: "accept anything that can be converted to a cupy array".

stefanv avatar Feb 19 '25 22:02 stefanv

Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy.

As we are still exploring the problem space, for now I'd like to gently push back on this idea, since it seems like something that the backend can solve with existing hooks. I'd like to keep our prototype minimal and lean as long as we don't really have a lot of experience with this yet. :)

Wouldn't it be possible to add this wrapping functionality in cucim's implementation of get_implementation? I'm also not clear on how the front-end would know when and how to apply these. It seems to me that would be a very backend specific decision.

That said, if these functions prove to be something needed by every backend, I'm not opposed at all to adding more hooks.


Would it be possible to build a an object, that's internally a cuCIM array until it is used in any way that could be an operation on a NumPy array? Then it would implicitly convert to a NumPy array. Then cuCIM could return this LazyNdarray from dispatched calls, and non-backend functions could use it as a NumPy array. However, if a successive call to the backend is made again, cuCIM could detect this LazyNdarray, fetch the internal cuCIM array and keep working with it.

In my mind, that could tackle the issues with unnecessary back-and-forth conversion potentially eating the speed-up of enabling a backend.

lagru avatar Mar 03 '25 08:03 lagru

My feeling is that yes, lazy/proxy return can work. I.e. it works fine in practice, except some niche users that really check for isinstance(arr, np.ndarray) and use that directly in C (which must be guarded with an isinstance check.

The question is a bit how purist you want to be. The purist in me still likes fully save type rules: numpy input -> numpy output, cupy input -> cupy output and that doesn't quite square. Maybe I should not be worried about this, because in practice the issue are often tiny...

More importantly, though. That problem goes away IMO, as long as the user does something explicit. An enable_cucim_interop(), or a with backend("cucim") or "cucim-interop", or ...

Also, whatever we do, at least cupy in -> cupy out is no problem at all (even if not interesting). And numpy in -> numpy out is only a problem because it may do unnecessary host-device copies (which could be a serious performance issue of course!).

seberg avatar Mar 05 '25 09:03 seberg