Avoid extra memory copy when using cp.concatenate in cuml.dask kmeans
Partial solution for #5936
Issue was that concatenating when having a single array per worker was causing a memory copy (not sure if always, but often enough). This PR avoids the concatenation when a worker has a single partition of data.
This is coming from a behavior from CuPy, where some testing reveals that sometimes it creates an extra allocation when concatenating lists that are comprised of a single array:
>>> import cupy as cp
>>> a = cp.random.rand(2000000, 250).astype(cp.float32) # Memory occupied: 5936MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 5936 MB <- no memory copy
>>> import cupy as cp
>>> a = cp.random.rand(1000000, 250) # Memory occupied: 2120 MB
>>> b = [a]
>>> c = cp.concatenate(b) # Memory occupied: 4028 MB <- memory copy was performed!
I'm not sure what are the exact rules that CuPy follows here, we could check, but in general avoiding the concatenate when we have a single partition is an easy fix that will not depend on the behavior outside of cuML's code.
cc @tfeher @cjnolet
/merge
/merge