torch_interpolations icon indicating copy to clipboard operation
torch_interpolations copied to clipboard

CUDA speedup on smaller tensors?

Open fiftysevendegreesofrad opened this issue 3 years ago • 0 comments

Hi, this module is great :-)

I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))

I'm getting

Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms

Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?

fiftysevendegreesofrad avatar Jun 18 '21 16:06 fiftysevendegreesofrad