cuml Avoid extra memory copy when using cp.concatenate in cuml.dask kmeans

Partial solution for #5936

Issue was that concatenating when having a single array per worker was causing a memory copy (not sure if always, but often enough). This PR avoids the concatenation when a worker has a single partition of data.

This is coming from a behavior from CuPy, where some testing reveals that sometimes it creates an extra allocation when concatenating lists that are comprised of a single array:

>>> import cupy as cp
>>> a = cp.random.rand(2000000, 250).astype(cp.float32) # Memory occupied: 5936MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 5936 MB <- no memory copy

>>> import cupy as cp
>>> a = cp.random.rand(1000000, 250) # Memory occupied: 2120 MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 4028 MB <- memory copy was performed!

I'm not sure what are the exact rules that CuPy follows here, we could check, but in general avoiding the concatenate when we have a single partition is an easy fix that will not depend on the behavior outside of cuML's code.

cc @tfeher @cjnolet

Jun 15 '24 17:06 dantegd

/merge

Jun 24 '24 23:06 dantegd

/merge

Jul 08 '24 04:07 dantegd