cucim
cucim copied to clipboard
[FEA] More efficient gradient computation
#340 proposed some minor improvements to gradient computation. However, I think there is still potential for a much stronger improvement. Specifically, by creating an custom ElementwiseKernel
, we could write all gradient terms into an ndim + 1
-dimensional output array during a single pass over the input image. That should be much more efficient than the repeated slicing operations as proposed here.
This could eventually be ported upstream for use in cupy.gradient
as well.
We have 6 or 7 functions that call gradient
directly and a number of others that call it indirectly, so this would be generally useful across the library.