cucim
cucim copied to clipboard
Slowness seen using PiecewiseAffineTransform compared to scikit-image version
Describe the bug The cucim.skimage.transform.PiecewiseAffineTransform seems to be several times slower than the scikit-image equivalent
Steps/Code to reproduce bug When running the code below, I observe a 8x slowdown for the estimate and 2x slowdown for the warp operations using the PyTorch 24.01 container with cucim 23.12
Expected behavior The code should execute at least as fast as the cpu version
Environment details (please complete the following information): Docker on Ubuntu 22.04 PyTorch 24.01 container with scikit-image and cucim 23.12 pip installed
Additional context
`import matplotlib.pyplot as plt
from skimage.transform import PiecewiseAffineTransform, warp
from scipy.interpolate import LinearNDInterpolator
import numpy as np
from timeit import default_timer as timer
from cucim.skimage.transform import PiecewiseAffineTransform as cu_PAT
from cucim.skimage.transform import warp as cu_warp
import cupy as cp
# create some offsets and coordinates
vectors = np.array([[3.0,1.0],[-5.,-1.3],[-3.5,8.3],[0,0],[0,0],[0,0], [0,0]])
coords = np.array([[20,20],[180,50],[20, 180],[0,0],[0,255],[255,0], [255,255]])
# Create grid
step_size = 20
x = np.linspace(0, 255, num=step_size)
y = np.linspace(0, 255, num=step_size)
X, Y = np.meshgrid(x, y)
interpx = LinearNDInterpolator(list(coords), vectors[:,0])
Zxi = interpx(Y, X)
interpy = LinearNDInterpolator(list(coords), vectors[:,1])
Zyi = interpy(Y, X)
# create an array of coords
src = np.column_stack((X.reshape(-1), Y.reshape(-1)))
# add the interpolated offets
dst_rows = X + Zxi
dst_cols = Y + Zyi
dst = np.column_stack([dst_cols.reshape(-1), dst_rows.reshape(-1)])
# compute transforms
tform = PiecewiseAffineTransform()
start = timer()
tform.estimate(src, dst)
print("cpu estimate took {}s".format(timer()-start))
start = timer()
out = warp(imgrid, tform, output_shape=(255, 255))
print("cpu warp took {}s".format(timer()-start))
# repeat using cupy/cucim.skimage
cu_tform = cu_PAT()
start = timer()
cu_tform.estimate(cp.array(src), cp.array(dst))
print("gpu estimate took {}s".format(timer()-start))
start = timer()
out = cu_warp(cp.array(imgrid), cu_tform, output_shape=(255, 255))
print("gpu warp took {}s".format(timer()-start))
`