YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

getchunksize API ?

Open bjarthur opened this issue 2 years ago • 3 comments

i don't see anywhere in the code or docs an API to get the chunk size. is there one? for now i'm using [x[end] for x in A.chunks[1]], which seems fragile.

ideally i'd like to get the chunk size of the cube underlying a DimArray after a yaxconvert. but i don't even see a way to programmatically get that other then to compute it before the conversion.

thanks!

bjarthur avatar Aug 14 '23 20:08 bjarthur

what do you mean? The output for GridChunks is a list of tuples with all chunks and sizes, as in here:

https://juliadatacubes.github.io/YAXArrays.jl/dev/examples/generated/UserGuide/setchuncks/?h=chunking#set-chunking-by-variable

what's the expected output that you want after calling what?

lazarusA avatar Aug 14 '23 21:08 lazarusA

In general a DiskArray is not guaranteed to have a well-defined regular chunk size. It can happen quite often due to concatenation of unevenly-sized arrays (think of annual netcdf files with leap years) that chunks are irregular. However, there is DiskArrays.approx_chunksize which would return the chunk size as a tuple.

So I think DiskArrays.approx_chunksize(DiskArrays.eachchunk(A.data)) should work for both DimArrays and YAXArrays. I agree it would be nice to export a nicely-named function that does this.

meggart avatar Aug 15 '23 06:08 meggart

ahah, i didn't realize that the chunking could be irregular. in my case though they are all the same size, so DiskArrays's approx_chunksize works well. thanks!

if you decide to hoist this API into DimensionalData or YAXArrays and export a function, i'd suggest designing an API that took a Dim name. something like, chunksize(A, Dim{:Y}). as it stands now with approx_chunksize, i have to convert a Dim into an integer index manually, which is fragile.

bjarthur avatar Aug 15 '23 14:08 bjarthur