aicsimageio icon indicating copy to clipboard operation
aicsimageio copied to clipboard

Weird reshaping corner case

Open colobas opened this issue 2 years ago • 18 comments

System and Software

  • aicsimageio Version: 4.6.4
  • Python Version: 3.10.4
  • Operating System: Linux

Description

Given an image with dimensions: <Dimensions [T: 1, C: 1, Z: 1, Y: 1796, X: 2188, S: 4]> , calling

data = img.get_image_data("SYX", T=0, Z=0, S=[0], C=0)

Yields an array with shape: (2188, 1, 1796)

Expected Behavior

What did you expect to happen instead?

The array should have shape (1, 1796, 2188)

Additional info

I narrowed it down to this line. When we get here, the value of dim_specs is [0, 0, 0, slice(None, None, None), slice(None, None, None), [0]], which is correct. However after running that line, the shape of the resulting array is (surprisingly) (1, 1796, 2188) rather than (1796, 2188, 1) which is what aicsimageio expects. After this, aicsimageio calls transpose_to_dims which then operates on an erroneous shape and obviously yields an incorrect result.

colobas avatar Apr 01 '22 02:04 colobas

Hey @colobas! Uhhhh I believe you but I can fully reproduce this...

what is order? and Also... there is a duplicate Z= in those params that when I run it with the extra I get:

In [4]: img.get_image_data("CYX", T=0, Z=0, Z=0, S=[0])
  File "<ipython-input-4-22e88a851a72>", line 1
    img.get_image_data("CYX", T=0, Z=0, Z=0, S=[0])
                                        ^
SyntaxError: keyword argument repeated: Z

Seems like you are already trying to solve it. Should I even start or are you planning on making a PR?

evamaxfield avatar Apr 01 '22 03:04 evamaxfield

Also, we don't release builds for python 3.10 so surprised you are able to run it.

evamaxfield avatar Apr 01 '22 04:04 evamaxfield

Sorry, I mistyped the get_image_data example here. in my case the kwargs to that call are dynamic so I was just writing down an equivalent call to what I'm doing. Here goes a reproducible example:

>>> import numpy as np
>>> from aicsimageio import AICSImage
>>> data = np.random.rand(1,1,1,20,10,4)
>>> data.shape
(1, 1, 1, 20, 10, 4)
>>> img = AICSImage(data, dim_order="TCZYXS")
>>> img.dims
<Dimensions [T: 1, C: 1, Z: 1, Y: 20, X: 10, S: 4]>
>>> data = img.get_image_data("SYX", T=0, Z=0, C=0, S=[0])
>>> data.shape
(10, 1, 20)

colobas avatar Apr 01 '22 15:04 colobas

Also, on a tangent: this happens in the context of reading a .png from S3, which has 4 channels (RGBA). I'd expect the S dimension to be 1-dimensional and the C dimension to be 4-dimensional, but for some reason the opposite happens. Any ideas?

colobas avatar Apr 01 '22 15:04 colobas

S is "samples" here, not Scenes ... but still, the numbers definitely look wrong.

toloudis avatar Apr 01 '22 16:04 toloudis

☝️ in aicsimageio v4 we changed S to mean "Samples"... still not sure if that was the correct decision or not 🤷

evamaxfield avatar Apr 01 '22 16:04 evamaxfield

An even more minimal example, with just numpy:

>>> import numpy as np
>>> data = np.random.rand(1,1,1,20,10,4)
>>> dim_specs = tuple([0,0,0,slice(None, None, None), slice(None, None, None), [0]])
>>> data[dim_specs].shape
(1, 20, 10)

colobas avatar Apr 01 '22 16:04 colobas

This seems to be the problem: https://stackoverflow.com/questions/35306372/implicit-transposing-in-numpy-array-indexing

colobas avatar Apr 01 '22 16:04 colobas

Yeah this has to do with mixing simple and advanced numpy indexing. I think we might need to do the indexing in two steps like. For the case above, doing the following gives the correct result:

>>> import numpy as np
>>> data = np.random.randn(1,1,1,20,10,4)
>>> dim_specs_simple = (0,0,0)
>>> dim_specs_advanced = (slice(None, None, None), slice(None, None, None), [0])
>>> data[dim_specs_simple][dim_specs_advanced].shape
(20, 10, 1)

In this case its easy because the simple-indexed dimensions happen to all come before the advanced-indexed dimensions. I think for the general case we would have to do something like:

>>> import numpy as np
>>> data = np.random.randn(1,1,1,20,10,4)
>>> dim_specs_simple = (0,0,0,slice(None, None, None), slice(None, None, None),slice(None, None, None))
>>> dim_specs_advanced = (slice(None, None, None), slice(None, None, None), [0])
>>> data[dim_specs_simple][dim_specs_advanced].shape
(20, 10, 1)

i.e. "fill the blanks" with slice(None, None, None)

colobas avatar Apr 01 '22 16:04 colobas

After tinkering a bit this is looking trickier than it may sound, and I might not have the bandwidth to help out at the moment...

colobas avatar Apr 01 '22 16:04 colobas

I will try to look at this in class next week. It sounds pretty involved in the numpy indexing stuff. 🤷 🤷 🤷

evamaxfield avatar Apr 01 '22 17:04 evamaxfield

I have a crazy idea for how to solve this but I need a white board so I am writing down some psuedo code here for right now:

requested_dims_ops (if we were to just pass the dims spec straight into the data array)
dim_spec_offset = 0
previous_data_size = len(data.shape)
for i, dim_spec in enumerate(requested_dims_ops):
    if the current data size is the same as previous data size then we can just use `i` to apply the dimspec
    if not, we need to compute some offset using the previous data size, the current size, and i?

basically iteratively apply the dim ops

evamaxfield avatar May 04 '22 22:05 evamaxfield

@colobas did you work around this problem in some way or do you still have broken code?

toloudis avatar Jul 20 '22 20:07 toloudis

Hey! I don't quite remember what I was doing when this problem arose but I believe it was just some example code. Whatever it was, I dropped it when I bumped into this...

colobas avatar Jul 21 '22 17:07 colobas

Hi! I just wanted to let you know that I'm still hitting this same problem with 4.9.1. I think it's a quite important bug because it will affect anyone working with data where channels are imported as S instead of C. This happens easily when metadata are missing. For example in this quite common way of saving data:

>>> import numpy as np
>>> import skimage
>>> from aicsimageio import AICSImage
>>>
>>> image = np.random.rand(1024, 2048, 3)
>>> skimage.io.imsave('test_tif.tif', image)
>>> im_read = AICSImage('test_tif.tif')
>>> im_read.data.shape
(1, 1, 1, 1024, 2048, 3)

the saved image gets reloaded with S=3 instead of C=3. And in such a case I get:

>>> im_import = im_read.get_image_data('SYX', S=[0,1])
>>> im_import.shape
(2048, 2, 1024)

or:

>>> im_import = im_read.get_image_data('YXS', S=[0,1])
>>> im_import.shape
(2, 1024, 2048)

Note that the same things happens if images are saved in the proper way and then reimported:

>>> from aicsimageio.writers import OmeTiffWriter
>>> image = np.random.rand(1024, 2048, 3)
>>> OmeTiffWriter.save(image, "file.ome.tif", dim_order="YXS")
>>> im_read = AICSImage('file.ome.tif')
>>> im_import = im_read.get_image_data('SYX', S=[0,1])
>>> im_import.shape
(2048, 2, 1024)

One can work around this problem by testing the import and reshaping the array appropriately. But not knowing if this behaviour will be the same for all files, it's a scary workaround.

guiwitz avatar Aug 10 '22 07:08 guiwitz

For now, an alternate workaround would be this:

im_import_0 = im_read.get_image_data('YX', S=0)
im_import_1 = im_read.get_image_data('YX', S=1)

and then stack them. Specifying a single index should be working.

toloudis avatar Aug 10 '22 13:08 toloudis

I also wonder if you passed the S=[0,1] as a slice instead, would it just plain work? Internally maybe we can just convert list to slice if it's a list containing contiguous indices

toloudis avatar Aug 10 '22 14:08 toloudis

Yes it works with a slice! I guess converting to slices internally is a good idea, because probably very few users would be aware of this (I have to admit I never used slice before). Thanks for the quick answer!

Of course the problem for non-contiguous cases remains.

guiwitz avatar Aug 10 '22 14:08 guiwitz

@guiwitz @colobas I am unable to reproduce this in python versions 3.8, 3.9, 3.10 running main with fresh dependency installs. I reverted back to the earliest version of numpy that supports python 3.10 (1.21.0) which is listed as the python version in the issue description and couldn't reproduce there either.

Here is what I ran:

import numpy as np
from aicsimageio import AICSImage

data = np.random.rand(1,1,1,20,10,4)
print(data.shape)
# (1, 1, 1, 20, 10, 4)

img = AICSImage(data, dim_order="TCZYXS")
print(img.dims)
# <Dimensions [T: 1, C: 1, Z: 1, Y: 20, X: 10, S: 4]>

data = img.get_image_data("SYX", T=0, Z=0, C=0, S=[0])
print(data.shape)
# Yields (1, 20, 10) for me instead of the buggy expected (10, 1, 20)

Closing this for now since it seems to have been solved incidentally, please re-open if you feel otherwise!

SeanLeRoy avatar Mar 16 '23 19:03 SeanLeRoy