dask-image
dask-image copied to clipboard
Make rotate available
This adds the rotate function to ndimage.
I am struggling with the tests. https://github.com/dask/dask-image/issues/210
cc @m-albert (in case this is of interest 🙂)
@martinschorb Thanks a lot for sharing your dask
implementation of rotate
and this PR!
It's great that you added tests. I'm copying some of your comments from https://github.com/dask/dask-image/issues/210 here to address them in this thread.
I have some weird behaviour that I don't fully understand (this is my first time using pytest). when I do not pass output_chunks on to affine_transform the test run fine, except for the timeouts. I always get Unknown pytest.mark.timeout - is this a typo?. Something seems to be missing in my pytest configuration because the timeout tests run fine with your affine_transform.
I had a look at the code and think that the problem regarding the timeouts is the following: rotate
in it's scipy
implementation, as opposed to affine_transform
, doesn't allow setting an output_shape
. Therefore, in the test test_rotate_large_input_small_output_cpu
it takes very long to create a large output array that is of the same size as the input array but with chunksize 1. In the analogous affine_transform
test, the size of the output array is set to [1,1,1]
and therefore there's no time spent on creating a large output array.
These two ways of going ahead would come to my mind:
- Stick to the
scipy
API and don't support differently shaped output arrays (and exclude the above test) or - Include an
output_shape
argument todask_image.ndinterp.rotate
as this could be a useful functionality.
I'd tend to the second option, as anyways the dask_image
API doesn't exactly mirror the scipy.ndimage
API here. On the other hand, somebody requiring this functionality could simply use affine_transform
with equivalent parameters.
when I pass output_chunks without any modifications, many tests fail: The reason for this: the images from ndimage and dask-image simply are not the same. There is a very small number of voxels that significantly differ. Why is that the case, when the only thing that differs is output_chunks being present?
I looked into the tests and found that the problem comes from affine_transform
. Namely, the choice of transformation parameters in your test seem to require a slightly larger slice of the input array for a given output chunk. Despite testing using different matrices, this hadn't been caught by existing tests.
I fixed this behaviour here https://github.com/dask/dask-image/pull/216 and improved the affine_transform
tests. If you include the small change the PR introduces, these tests should pass for you, too.
Hi Marvin,
Stick to the
scipy
API and don't support differently shaped output arrays (and exclude the above test) orInclude an
output_shape
argument todask_image.ndinterp.rotate
as this could be a useful functionality.
The reason why there is no way of explicitly setting output_shape
is that for rotate
there are two main use cases. These are defined by the reshape
parameter and it will set the output size to what is required to keep all the transformed image in the FOV.
Probably we could deal with both options and also enable output_shape
(that would include checking the offset
calculation). Then I would think of a good way of also testing for the reshape
parameter.
What is the proper way of doing this? I change it in my repo and create a new PR?
Cheers
Hi Martin,
The reason why there is no way of explicitly setting output_shape is that for rotate there are two main use cases. These are defined by the reshape parameter and it will set the output size to what is required to keep all the transformed image in the FOV.
True that reshape
affects the output shape.
Probably we could deal with both options and also enable output_shape (that would include checking the offset calculation). Then I would think of a good way of also testing for the reshape parameter.
I agree that the best would probably be to have both arguments there and functional. Of course, conflicting parameters such as reshape=True
together with a given output_shape
would need to lead to an exception.
What is the proper way of doing this? I change it in my repo and create a new PR?
Any changes and commits to your branch martinschorb:main
are automatically included here in the PR. Is this what you meant?
One thing I'm wondering about is whether the testing strategy is the most convenient like this. Given that a large part of the tests actually tests the functionality of affine_transform
, maybe there'd be a way of reducing this overlap. This would come in handy especially as parallel PRs such as https://github.com/dask/dask-image/pull/215 are currently improving affine_transform
and also significantly changing its tests. Hmm not sure yet what the best solution here would be.
Cheers!
I am still getting slightly wrong results when comparing an image that is transformed with the output_shape
set and the respective sub-area of a transformed image without. It looks like some interpolation is screwing up things...
from scipy import misc
from dask_image.ndinterp import rotate
img = da.from_array(misc.ascent())
img = img[::4,::4]
im2 = rotate(img,angle)
output_shape = (9,9)
cx=img.shape[0]/2
cy=img.shape[1]/2
rx=output_shape[0]/2
ry=output_shape[1]/2
im3 = rotate(img,angle,output_shape = output_shape)
im4 = im2[cx-np.floor(rx):cx+np.ceil(rx),cy-np.floor(ry):cy+np.ceil(ry)]
these have clearly different values and it is not just a translation of pixels...
I'd love to have this checked by a test as well, but first need to understand where this difference comes from.
Hey @martinschorb, I just found back our conversation here :)
It seems that back then I was a bit confused on what was going on with the output_shape
parameter, I'm sorry for that!
Can one of the admins verify this patch?
Admins can comment ok to test
to allow this one PR to run or add to allowlist
to allow all future PRs from the same author to run.
these have clearly different values and it is not just a translation of pixels... I'd love to have this checked by a test as well, but first need to understand where this difference comes from.
I just ran your nice code example and realised that in the case of even values for output_shape
, im3
and im4
are identical as expected. In the case of odd values, I think the rotated image is simply not directly comparable to im2
sliced by integer values? So I suspect your implementation of rotate
actually behaves as expected.
add to allowlist
I've resolved some minor conflicts between this PR branch and the dask-image main branch.
Thanks a lot @GenevieveBuckley!
I'll try to have a look at what's up with the failing tests in the next days.
Hi all, thanks for this, it looks great, and I'm already using it in my workflow :)
I see that not passing the output_chunks
parameter yields just one big chunk for the rotate operation even the input array was chunked (so no parallelization).
Any thoughts about adding:
if output_chunks is None:
output_chunks = input_arr.chunksize
@gcaria I think that's a good idea! In the case of dask_image.ndinterp.affine_transform
the shapes of the input and output images can strongly vary, and there's no reason to assume that input and output would have the same chunksizes. However, in the case of rotate
the input and output stacks have similar sizes and assuming the input and output chunksizes to be the same seems reasonable to me.
@martinschorb How's your availability and willingness to continue working on this PR? I think we're not far from finishing a useful addition to dask-image
and it'd be great to continue with it one way or another.
@gcaria I think that's a good idea! In the case of
dask_image.ndinterp.affine_transform
the shapes of the input and output images can strongly vary, and there's no reason to assume that input and output would have the same chunksizes. However, in the case ofrotate
the input and output stacks have similar sizes and assuming the input and output chunksizes to be the same seems reasonable to me.@martinschorb How's your availability and willingness to continue working on this PR? I think we're not far from finishing a useful addition to
dask-image
and it'd be great to continue with it one way or another.
Hi @m-albert , will look into it. Let's see how we can get this through...
Now it needs 5 more tests to get everything covered. Very simple cases, working on it ...
Great to see all the progress on this :) Just FYI I'm traveling this week and will be happy to give feedback beginning of next week.
these details are still not perfect:
-
output
parameter: I have implemented thedtype
option here only. I don't fully understand how to place the result into an existing dask array. Also, the original implementation only works if the provided output array has the exact shape of the rotation result. -
order
andprefilter
parameters are restricted. This is now explained in the docs.
I have no idea what is the problem with the GPU CI...
https://gpuci.gpuopenanalytics.com/job/dask/job/dask-image/job/prb/job/dask-image-prb/169/CUDA_VER=11.5.2,LINUX_VER=ubuntu20.04,PYTHON_VER=3.10/console#L783
tests/test_dask_image/test_ndfilters/test_cupy_ndfilters.py::test_cupy_order[rank_filter-5-3] Fatal Python error: Aborted
This line/file was never touched in any of the commits.
I wonder if that CI failure is related to the change in versions here: https://github.com/dask/dask-image/pull/345
In the commits there I don't see the gpuCI build check...
Thanks for all your work here @martinschorb.
I have no idea what is the problem with the GPU CI... This line/file was never touched in any of the commits.
I think the GPU CI problems are unrelated to this PR and might relate to the intermittent failures observed by @GenevieveBuckley here. I tested this PR branch with cupy installed and all tests are passing locally for me. I think we can ignore this GPU CI failure in the context of this PR.
Regarding the code:
output parameter: I have implemented the dtype option here only. I don't fully understand how to place the result into an existing dask array. Also, the original implementation only works if the provided output array has the exact shape of the rotation result.
Supporting an output array into which the result is directly written is probably of limited use here and I'm not sure if it's even feasible. I think you're right in that determining an output dtype is related to this. However, since 1) conversion to a different dtype can be easily achieved by casting the output array and 2) to be consistent with dask_image.ndinterp.affine_transform
I suggest to leave out this parameter completely.
Another parameter we can remove is output_shape
. This way we're consistent with scipy.ndimage.rotate
(and don't interfere with the reshape
argument).
@martinschorb, if you'd like I could work on my comments above. Would you agree to collaborate on this PR? If you agree I'd push related changes here directly and combine them with your work. We're almost there to get this through :)
Yes, let's get this done.
Alright, I took out the parameters discussed above and outsourced others to affine_transform
to simplify testing and propagation of potential future changes.
All tests are passing (the failing doc test is unrelated to this PR) and in my view we're ready to go.
Merging now 🎉
Thank you for your wonderful contribution to dask_image
@martinschorb.
Also thank you to @gcaria for your feedback and a great suggestion regarding output chunks.