torch_interpolations
torch_interpolations copied to clipboard
CUDA speedup on smaller tensors?
Hi, this module is great :-)
I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))
I'm getting
Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms
Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?