Zarr.jl icon indicating copy to clipboard operation
Zarr.jl copied to clipboard

how should `"fill_value": null` be read?

Open gdkrmr opened this issue 3 years ago • 3 comments
trafficstars

We have a zarray with the following .zarray:

{"dtype":"<f4","filters":null,"shape":[15706,720,1440],"order":"C","zarr_format":2,"chunks":[5844,60,60],"fill_value":null,"compressor":{"blocksize":0,"id":"blosc","clevel":5,"cname":"lz4","shuffle":1}}

when reading data from the directory I get the following error

julia> data = zopen(path)
julia> data[1, 1, 1]
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Float32
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:130
  ...
Stacktrace:
  [1] fill!(dest::Array{Float32, 3}, x::Nothing)
    @ Base ./array.jl:351
  [2] readchunk!(a::Array{Float32, 3}, z::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, i::CartesianIndex{3})
    @ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:188
  [3] (::Zarr.var"#66#69"{Bool, OffsetArrays.OffsetArray{Float32, 3, Array{Float32, 3}}, ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, CartesianIndices{3, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}}, Array{Float32, 3}})(bI::CartesianIndex{3})
    @ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:161
  [4] foreach
    @ ./abstractarray.jl:2774 [inlined]
  [5] readblock!(aout::Array{Float32, 3}, z::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, r::CartesianIndices{3, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}}; readmode::Bool)
    @ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:151
  [6] readblock!
    @ ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:143 [inlined]
  [7] readblock!
    @ ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:174 [inlined]
  [8] getindex_disk
    @ ~/.julia/packages/DiskArrays/VGEpq/src/diskarray.jl:31 [inlined]
  [9] getindex(::ZArray{Float32, 3, Zarr.BloscCompressor, DirectoryStore}, ::Int64, ::Int64, ::Int64)
    @ DiskArrays ~/.julia/packages/DiskArrays/VGEpq/src/diskarray.jl:177
 [10] top-level scope
    @ REPL[30]:1

I think the error comes from fill_value:null. The data can be read with xarray. Is this error correct behavior?

@MartinuzziFrancesco

gdkrmr avatar Oct 19 '22 14:10 gdkrmr

The question is what you would expect? Yes, xarray reads the data but it is returning garbage for non-existing chunks. The problem is that a zarr array with fill_value null which has missing chunks is per se not well-defined. This was a long discussion in the python zarr impl and the conclusion was that fill_value=null will be forbidden in zarr v3.

I think that throwing an error is better than returning garbage data, but of course the error message should be improved.

meggart avatar Oct 20 '22 12:10 meggart

A suggestion to deal with this could probably be to add a warning message that there are some missing values and fill them with nothings, dispatching over the base convert function. At the moment even for a single missing value for a full timeseries it's not possible to read it because of this error. This could also allow the user to actually count the missing values

MartinuzziFrancesco avatar Oct 20 '22 13:10 MartinuzziFrancesco

I think that throwing an error is better than returning garbage data

I agree with you here. Throwing an error is correct. We could also return an Array{Union{Nothing, Float32}}, this should behave similarly to Array{Union{Missing, Float32}}, right?

gdkrmr avatar Oct 20 '22 13:10 gdkrmr