YAXArrays.jl
YAXArrays.jl copied to clipboard
Does YAXArrays.jl support interop with xarray?
If we have a .nc or .zarr file generated by xarray, can we read and manipulate it with all functionality with YAXArrays.jl?
- I tried open .zarr file generated by xarray and it works perfectly.
I just don't know if it is intentionally supported or it just happen to work.
Also, from other direction, if we have files generated by YAXArrays.jl, is that possible to use all functionality from xarray?
Yes,and if some things don't work please open more issues.
There is also https://github.com/meggart/PyYAXArrays.jl for interop in a session, so that you could load your data with xarray and then use YAXArrays functionality for the analysis. This is experimental at the moment so you might not want to rely on it too much.
Thanks for the clarification! Closing this issue now.
If we have a
.ncor.zarrfile generated by xarray, can we read and manipulate it with all functionality with YAXArrays.jl?
- I tried open .zarr file generated by xarray and it works perfectly.
I just don't know if it is intentionally supported or it just happen to work.
Also, from other direction, if we have files generated by YAXArrays.jl, is that possible to use all functionality from xarray?
For clarification, this is because both package (xarray and YAXArrays.jl) reads standardized data in the netcdf/zarr format (and other metadata-based format I know less)
I find out an inconsistency when open xarray saved np.complex64 data. YAXArrays.jl will read a Complex64 data, which is corresponding to np.complex128 data.
using PythonCall
pyexec("
import numpy as np
import xarray as xr
", Main)
@pyexec """
data = np.random.random((3, 5)) + 1j*np.random.random((3, 5))
data = data.astype(np.complex64)
da = xr.DataArray(data, dims=["x", "y"], name="random_complex")
dtype = data.dtype
ds = xr.Dataset({"random_complex": da})
ds.to_zarr("random_complex.zarr", mode="w")
""" => dtype
this will return
Python: dtype('complex64')
which means out data have np.complex64 datatype which consists 2 float32 array.
open_dataset("random_complex.zarr", driver=:zarr)["random_complex"]
it will return
┌ 5×3 YAXArray{ComplexF64, 2} ┐
├─────────────────────────────┴────────────────────────────────────────── dims ┐
↓ y Sampled{Int64} 1:5 ForwardOrdered Regular Points,
→ x Sampled{Int64} 1:3 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any} with 1 entry:
"name" => "random_complex"
├─────────────────────────────────────────────────────────────── loaded lazily ┤
data size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
here data type is ComplexF64, which contains 2 Float64 array. And we also cannot index this array.
uncompressed data is not a multiple of sizeof(ComplexF64)
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] decompress!(dest::Vector{ComplexF64}, src::Vector{UInt8})
@ Blosc ~/.julia/packages/Blosc/jk4Np/src/Blosc.jl:184
[3] zuncompress!
@ ~/.julia/packages/Zarr/3QSdj/src/Compressors.jl:68 [inlined]
[4] zuncompress!
@ ~/.julia/packages/Zarr/3QSdj/src/Compressors.jl:14 [inlined]
[5] uncompress_raw!(a::Matrix{ComplexF64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, curchunk::Vector{UInt8})
@ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:261
[6] uncompress_to_output!(aout::Matrix{ComplexF64}, output_base_offsets::Tuple{Int64, Int64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, chunk_compressed::Vector{UInt8}, current_chunk_offsets::Tuple{Int64, Int64}, a::Matrix{ComplexF64}, indranges::Tuple{UnitRange{Int64}, UnitRange{Int64}})
@ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:270
[7] readblock!(aout::Matrix{ComplexF64}, z::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, r::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}})
@ Zarr ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:178
[8] readblock!
@ ~/.julia/packages/Zarr/3QSdj/src/ZArray.jl:247 [inlined]
[9] readblock_sizecheck!
@ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:337 [inlined]
[10] getindex_disk_nobatch!(out::Nothing, a::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, i::Tuple{Colon})
@ DiskArrays ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:247
[11] getindex_disk!(out::Nothing, a::ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, i::Function)
@ DiskArrays ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:260
[12] getindex_disk
@ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:218 [inlined]
[13] getindex
@ ~/.julia/packages/DiskArrays/ny95C/src/diskarray.jl:370 [inlined]
[14] getindex(A::YAXArray{ComplexF64, 2, ZArray{ComplexF64, 2, Zarr.BloscCompressor, DirectoryStore}, Tuple{Dim{:y, DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}, Dim{:x, DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Dict{String, Any}}, i::Colon)
@ DimensionalData ~/.julia/packages/DimensionalData/oXUIT/src/array/indexing.jl:61
That seems to be specifically a Zarr.jl issue. There is a mismatch in how python and Julia zarr implementations interpret the zarr metadata.
We would have to change the Zarr.sizemapf function for complex numbers and multiply the sizeof by 2.
I might be able to open a PR later on.
see https://github.com/JuliaIO/Zarr.jl/pull/168
That might be related but is not the main issue.
The complex value mismatch between julia and python should be closed by https://github.com/JuliaIO/Zarr.jl/pull/181.