DiskArrays.jl icon indicating copy to clipboard operation
DiskArrays.jl copied to clipboard

vec(view(c, :, :, 1)) fails: reshape is restricted to adding singleton dimensions

Open lazarusA opened this issue 1 year ago • 7 comments

The following fails:

# add DiskArrays#main
# add DimensionalData#main
using Zarr, DiskArrays
using YAXArrays
using DimensionalData

path = "gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710"
g = open_dataset(zopen(path; consolidated=true))

data_cube = DimensionalData.modify(g.tas) do arr
    return DiskArrays.CachedDiskArray(arr)
end
v = view(data_cube, :, :, 1)

vec(v) 
ERROR: For DiskArrays, reshape is restricted to adding singleton dimensions
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] reshape_disk(parent::DiskArrays.SubDiskArray{…}, dims::Tuple{…})
   @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/reshape.jl:46
 [3] _reshape(A::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Tuple{Int64})
   @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/reshape.jl:79
 [4] reshape(parent::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Tuple{Int64})
   @ Base ./reshapedarray.jl:112
 [5] reshape(parent::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Int64)
   @ Base ./reshapedarray.jl:117
 [6] vec(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false})
   @ Base ./abstractarraymath.jl:41
 [7] vec(A::YAXArray{Float32, 2, DiskArrays.SubDiskArray{…}, Tuple{…}})
   @ DimensionalData ~/.julia/packages/DimensionalData/kYR9Z/src/array/array.jl:76
 [8] top-level scope
   @ ~/Documents/NDimensionalViewer/cached_c.jl:15
Some type information was truncated. Use `show(err)` to see complete types.

any work around @rafaqz @meggart ?

lazarusA avatar Feb 16 '24 12:02 lazarusA

view(v, :) maybe?

rafaqz avatar Feb 16 '24 13:02 rafaqz

The problem here is that this kind of view is inherently incompatible with eachchunk because the linear indexing simply destroys the natural chunking so there is no way to define eachchunk on this view. @lazarusA do you actually to read the data or do you need a lazy view into the data on disk?

meggart avatar Feb 16 '24 14:02 meggart

What I meant was if you want to read the data anyway you could do v[:] instead of vec(v)

meggart avatar Feb 16 '24 14:02 meggart

Maybe the correct thing would be to overload vec or reshape for SubDiskArray, if there are proper workarounds. I run into this deep inside Makie, where I'm using generic array operations to handle the input arrays, so it would be nice if I could keep doing so :) Just for reference, this is how I run into this:

# This happens also inside a function, that expects to work on any AbstractArray
# view is chosen, so that we can materialize/read the data as late as possible inside Makie
data =  view(data_cube, :, :, 1) 
# The below happens in WGLMakie serialization, since JS doesn't support Matrices.
# `vec` seems to be the ideal choice, since it best preserves the array type and can be allocation free without returning a view
vec(data) 

SimonDanisch avatar Feb 16 '24 14:02 SimonDanisch

if there are proper workarounds.

This is the problem, currently there is no sensible workaround. I think the only options would be to read the data (bad, because it might be large data) or to return the Unchunked() trait for these types of views so that at least most indexing should work. I can look into this.

meggart avatar Feb 16 '24 14:02 meggart

Ok, this seems to work on #146 after defining vec(a::AbstractDiskArray) = view(a,:) I just pushed this to the branch

meggart avatar Feb 16 '24 15:02 meggart

Instead of the full Unchunked(), this is where doing the minimum rechunking that can get sequential indices could help.

rafaqz avatar Feb 16 '24 17:02 rafaqz