squidpy icon indicating copy to clipboard operation
squidpy copied to clipboard

Initialize ImageContainer object using dask array

Open morriso1 opened this issue 4 years ago • 4 comments

Hi squidpy team,

I am just starting to try out squidpy and looks it really useful! I was wondering if it is currently possible to construct an ImageContainer object using a dask array?

I have 4i data of example shape (3, 10, 32, 1152, 2048, 2048), where the dimensions are: (channel, round of staining, well, position, y, x). Using dask, I have tiled to generate data of shape (10, 32, 3, 12288, 12288), where the dimensions are then: (round of staining, well, channel, y, x).

How would it be best to construct an ImageContainer object from this data?

Thanks for your help! Any advice would be greatly appreciated.

morriso1 avatar Jun 16 '21 16:06 morriso1

Hi @morriso1 ,

for this, you'd need to install the latest version as pip install git+https://github.com/theislab/squidpy@dev where we've added support for dask. Also, our ImageContainer only supports 4D images, so one possibility on how to load it would be to run:

import squidpy as sq
import dask.array as da

arr = da.random.normal(size=(10, 32, 3, 12288, 12288))
dims = ("z", "channel", "y", "x")  # iterate over wells later; y, x, z must be present

img = sq.im.ImageContainer()
for well in range(arr.shape[2]):
    img.add_img(arr[:, :, well, ...], dims=dims, layer=f"well_{well}")

or something similar.

I also recommend taking a look at these 2 tutorials:

  • https://squidpy.readthedocs.io/en/latest/auto_tutorials/tutorial_image_container.html
  • https://squidpy.readthedocs.io/en/latest/auto_tutorials/tutorial_image_container_zstacks.html

michalk8 avatar Jun 17 '21 20:06 michalk8

Thanks for your rapid reply and sorry for my delayed one. This works well and thanks for the links to your excellent documentation. Sorry, I am not very familiar with xarray and have since realized that I could have constructed a dask backed xarray dataarray first, and then passed this to the ImageContainer constructer.

I was wondering whether it might be possible to support extra dimensions other than those named z? Super useful would be the ability to support multi-indexes. This would enable easy and explicit slicing of long arrays of repeated measurements (e.g. wells). Here's an example with a 7-day timecourse experiment, with 7 measurements per day:

import dask.array as da
import numpy as np
import pandas as pd
import squidpy as sq
import xarray as xr

darr = da.random.normal(size=(4, 49, 2048, 2048))
index = pd.MultiIndex.from_arrays(
    [np.arange(0, 7).repeat(7), np.tile(np.arange(0, 7), 7)], names=("day", "replicate")
)
xarr = xr.DataArray(
    darr,
    coords={"channel": range(4), "well": index, "y":np.arange(0, 409.6, 0.2), "x": np.arange(0, 409.6, 0.2)},
    dims=["channel", "well", "y", "x"],
)

xarr.sel(well = {'day': 0, 'replicate':slice(0,4)}) ## easy slicing

foo = sq.im.ImageContainer()
foo.add_img(xarr, dims=("channel", "well", "y", "x"))

Currently this throws error "Expected to find ['z'] dimension(s) in ('channel', 'well', 'y', 'x')."

Renaming to 'well' to 'z' enables an image to be added to the ImageContainer, however, it no longer supports multi-index as z coordinates have been converted to <U6 dtype.

Any help or suggestions would be greatly appreciated! Thanks!

morriso1 avatar Jun 20 '21 20:06 morriso1

Renaming to 'well' to 'z' enables an image to be added to the ImageContainer, however, it no longer supports multi-index as z coordinates have been converted to <U6 dtype.

As for adding a new dimension (such as time), pinging @hspitzer for opinion; however this would require a non-trivial amount of work at the moment. Regarding MultiIndex, I think we should be able to support it by not converting it to strings and could, in some way, circument the problem of having only 4 dims, as you did above.

Right now, the best way of storing your data would be 1 day per layer, though am not sure whether this format is useful for your analysis.

michalk8 avatar Jun 22 '21 17:06 michalk8

thanks @michalk8 for prompt reply!

As for adding a new dimension (such as time), pinging @hspitzer for opinion; however this would require a non-trivial amount of work at the moment.

hi @morriso1 , as @michalk8 mentioned right now (as in with pip install -e . # on dev branch ) we only support 2D dask arrays (z, c, y, x). Would you mind elaborate what would be the use case to extend to 5D arrays? I see this specific case of your, but beside storing the data in the ImageContainer, what other functionalities/tools in Squidpy would you use downstream if we were to support 5D arrays? Thank you in advance for your feedback, we are trying to understand different use cases and many are still not entirely clear to us.

giovp avatar Jun 23 '21 08:06 giovp