Zarr.jl
Zarr.jl copied to clipboard
how should `"fill_value": null` be read?
We have a zarray with the following .zarray:
{"dtype":"<f4","filters":null,"shape":[15706,720,1440],"order":"C","zarr_format":2,"chunks":[5844,60,60],"fill_value":null,"compressor":{"blocksize":0,"id":"blosc","clevel":5,"cname":"lz4","shuffle":1}}
when reading data from the directory I get the following error
julia> data = zopen(path)
julia> data[1, 1, 1]
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Float32
Closest candidates are:
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:185
convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:130
...
Stacktrace:
[1] fill!(dest::Array{Float32, 3}, x::Nothing)
@ Base ./array.jl:351
[2] readchunk!(a::Array{Float32, 3}, z::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, i::CartesianIndex{3})
@ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:188
[3] (::Zarr.var"#66#69"{Bool, OffsetArrays.OffsetArray{Float32, 3, Array{Float32, 3}}, ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, CartesianIndices{3, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}}, Array{Float32, 3}})(bI::CartesianIndex{3})
@ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:161
[4] foreach
@ ./abstractarray.jl:2774 [inlined]
[5] readblock!(aout::Array{Float32, 3}, z::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, r::CartesianIndices{3, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}}; readmode::Bool)
@ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:151
[6] readblock!
@ ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:143 [inlined]
[7] readblock!
@ ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:174 [inlined]
[8] getindex_disk
@ ~/.julia/packages/DiskArrays/VGEpq/src/diskarray.jl:31 [inlined]
[9] getindex(::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, ::Int64, ::Int64, ::Int64)
@ DiskArrays ~/.julia/packages/DiskArrays/VGEpq/src/diskarray.jl:177
[10] top-level scope
@ REPL[30]:1
I think the error comes from fill_value:null. The data can be read with xarray. Is this error correct behavior?
@MartinuzziFrancesco
The question is what you would expect? Yes, xarray reads the data but it is returning garbage for non-existing chunks. The problem is that a zarr array with fill_value null which has missing chunks is per se not well-defined. This was a long discussion in the python zarr impl and the conclusion was that fill_value=null will be forbidden in zarr v3.
I think that throwing an error is better than returning garbage data, but of course the error message should be improved.
A suggestion to deal with this could probably be to add a warning message that there are some missing values and fill them with nothings, dispatching over the base convert function. At the moment even for a single missing value for a full timeseries it's not possible to read it because of this error. This could also allow the user to actually count the missing values
I think that throwing an error is better than returning garbage data
I agree with you here. Throwing an error is correct. We could also return an Array{Union{Nothing, Float32}}, this should behave similarly to Array{Union{Missing, Float32}}, right?