pizzarr
pizzarr copied to clipboard
Indexing for arbitrary elements of dimensions
User story
Hey there,
Is it possible to access arbitrary elements of dimensions (like it is done in Rarr with index arguement) instead of using slice. Is this already implemented or not available at the moment ?
zarr.array <- pizzarr::zarr_open(store = "data/mat.zarr")
mat <- array(1:350, c(10, 5, 7))
zarr.array$create_dataset("assay", data = mat, shape = dim(mat))
zarr.array$get_item("assay")$get_item(list(slice(1,6,2), slice(1, 2), slice(1, 1)))$data
, , 1
[,1] [,2]
[1,] 1 11
[2,] 3 13
[3,] 5 15
It is possible to access a single element.
zarr.array$get_item("assay")$get_item(c(1, 2, 1))$data
, , 1
[,1]
[1,] 72
But with multiple elements, it doesnt work.
zarr.array$get_item("assay")$get_item(c(1:2, 2, 1))$data
Error in check_selection_length(selection, shape) : TooManyIndicesError
zarr.array$get_item("assay")$get_item(list(c(1,6), 2, 1))$data
Error in if (is.na(stop)) { : the condition has length > 1
$get_item(c(1:2, 2, 1))
Indexing with numeric vectors is difficult since the elements become flattened by default, unlike with lists
> c(1:2, 2, 1)
[1] 1 2 2 1
> list(1:2, 2, 1)
[[1]]
[1] 1 2
[[2]]
[1] 2
[[3]]
[1] 1
Perhaps you can do something fancy with rlang https://rlang.r-lib.org/reference/topic-defuse.html and prevent the flattening behavior / intercept prior to flattening.
The vector vs. list issue aside, there is this outstanding need to support integer indexing: https://github.com/keller-mark/pizzarr/issues/43
However at the moment, you could turn lists of integers into lists of slices in order to work around this:
to_slice <- function(i) {
if(length(i) == 1) {
return(slice(i, i))
}
if(length(i) == 2) {
return(slice(i[1], i[2]))
}
if(length(i) == 3) {
return(slice(i[1], i[2], i[3]))
}
stop("Received indexing vector with too many elements")
}
selection <- z$get_item(lapply(x, to_slice))
We also have this bracket indexing function which may be relevant: https://github.com/keller-mark/pizzarr/blob/f84355d2708c22dc6e703f3cdd83d218221b352a/R/zarr-array.R#L1213
z[2, 5]
Example in test here: https://github.com/keller-mark/pizzarr/blob/main/tests/testthat/test-s3.R#L47
Here is have to updated and implemented further right ? I will attempt if you guys haven't planned yet.
https://github.com/keller-mark/pizzarr/blob/f84355d2708c22dc6e703f3cdd83d218221b352a/R/indexing.R#L88-L100
I like the fact that this repo is functionally an R replica of the original zarr-python implementation. I was able to implement IntArrayDimIndexer and OrthogonalIndexer classes to get get_item to accept orthogonal selection. There are still a few bugs I need to take care of, otherwise the DelayedArray assumption of random index access is satisfied.
Here is more info on our DelayedArray extension:
https://github.com/BIMSBbioinfo/ZarrArray
Here are some examples:
# write
zarr.array <- pizzarr::zarr_open(store = "data/mat_example.zarr", mode = "w")
mat_test <- matrix(1:100, nrow = 10)
zarr.array$create_dataset("assay", data = mat_test, shape = dim(mat_test), chunks = c(2,2))
# read
zarr.array <- pizzarr::zarr_open(store = "data/mat_example.zarr", mode = "r")
a <- zarr.array$get_item("assay")
a[c(1,6,7),c(2,8,9)]$data
[,1] [,2] [,3]
[1,] 11 71 81
[2,] 16 76 86
[3,] 17 77 87
Would you guys like a PR on this once everything is tidy ?
@keller-mark has the final say, but I'd be happy to get the contribution!
I agree with @dblodgett-usgs, the contribution is welcome! Compatibility with DelayedArray would be great!
Awesome guys, thanks for the quick response, I will let you know!