zarr.js icon indicating copy to clipboard operation
zarr.js copied to clipboard

[Feature] Support for complex "fancy" indexing

Open manzt opened this issue 5 years ago • 5 comments
trafficstars

I wanted to move this from the docs to here.

In python, zarr supports the ability to index arrays using binary and integer masks:

import zarr

z = zarr.open("foo.zarr")
z.shape
# (5, 50, 50)

arr1 = z[[1,2,3],:,:]
arr1.shape
# (3, 50, 50)

arr2 = z[[False, True, False, True, True]])
arr1.shape == arr2.shape
# True

I am not sure what this would entail implementation-wise, but I'm guessing a new Indexer is needed. I would be interested in hearing what ideas you had in mind!

manzt avatar Jan 16 '20 19:01 manzt

For reference, the zarr documentation on fancy indexing.

Yes, in the Python implementation they have their own indexer, see this one for the boolean indexing, which then calls this. I would say that the implementation is just a matter of typing out the same code but in Typescript and add a bunch of test cases for it (you can just copy the Python zarr ones), it's not very complex.

The coordinate array setting is much more complicated, it relies on numpy broadcasting and a few other numpy functions, so that would have to be ported to TS. See here for the indexer code. One could implement it without supporting broadasting to simplify it a bit. I'm not sure it's worth the hassle of implementing right now, surely the coordinate indexing is easy to convert to a boolean mask for arrays that are not huge - and how often do you really use it?

gzuidhof avatar Jan 16 '20 22:01 gzuidhof

Thank you for the detailed response! I will look into the python implementation and translating into TS.

I agree. It would be ideal to have both boolean indexing and coordinate indexing, but having the functionality of "fancy" indexing at all is what is desired. Prefer the simpler route for now, and perhaps add the other in the future!

manzt avatar Jan 16 '20 23:01 manzt

Oh, actually the boolean indexing builds upon the coordinate indexing, so it looks like it will be quite complicated to implement either..

gzuidhof avatar Jan 16 '20 23:01 gzuidhof

I see. Thanks for looking into this.

manzt avatar Jan 17 '20 19:01 manzt

FWIW you might want to take a look at orthogonal selections rather than coordinate selections. The orthogonal indexer supports indexing with either integer or boolean arrays, and is a simpler implementation than the coordinate indexer. We also find we hardly ever need coordinate selection in our use of zarr, orthogonal selections are used much more often.

I believe numpy has plans to split these different types of fancy indexing out in their API at some point in the future via .oindex[] and .vindex[] notation, like zarr currently supports.

On Fri, 17 Jan 2020 at 19:17, Trevor Manz [email protected] wrote:

I see. Thanks for looking into this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gzuidhof/zarr.js/issues/16?email_source=notifications&email_token=AAFLYQQ2UZAHHKSD2CPRUBLQ6H74PA5CNFSM4KHZU4N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIWF3Q#issuecomment-575759086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLYQWV4TZVP3AR2NEQS5DQ6H74PANCNFSM4KHZU4NQ .

--

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health Big Data Institute Li Ka Shing Centre for Health Information and Discovery University of Oxford Old Road Campus Headington Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 or +44 (0)7866 541624 Email: [email protected] Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: @alimanfoo https://twitter.com/alimanfoo

Please feel free to resend your email and/or contact me by other means if you need an urgent reply.

alimanfoo avatar Jan 17 '20 22:01 alimanfoo