YAXArrays.jl
YAXArrays.jl copied to clipboard
Persisting arrays on disk using zarr backend does not encode data using scale_factor/offset
Hello, I am processing data stored in zarr format. Most of it are stored on disk encoded following CF conventions. For instance I have a YAXArray
┌ 1500×1200 YAXArray{Union{Missing, Float64}, 2} ┐
├────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ columns Sampled{Int64} 1:1500 ForwardOrdered Regular Points,
→ rows Sampled{Int64} 1:1200 ForwardOrdered Regular Points
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any} with 10 entries:
"units" => "K"
"name" => "s7_bt_in"
"coordinates" => "latitude longitude x y"
"short_name" => "s7_bt_in"
"add_offset" => 283.73
"long_name" => "gridded pixel brightness temperature for channel s7 (1km TIR grid, nadir view)"
"missing_value" => -32768
"scale_factor" => 0.01
"standard_name" => "toa_brightness_temperature"
"_FillValue" => -32768
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── loaded lazily ┤
data size: 13.73 MB
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
The underlying data is a CFDiskArray highlighting that the original zarray is Int16
1500×1200 DiskArrayTools.CFDiskArray{Union{Missing, Float64}, 2, Int16, ZArray{Int16, 2, Zarr.BloscCompressor, Zarr.ConsolidatedStore{DirectoryStore}}, Float64}
Chunked: (
[1500]
[1200]
)
When I save the data on disk using savecube or savedataset the array is stored as float64, and not encoded/packed in Int16.
For information, the original YAXArray is created with open_dataset from an existing zarr file and correctly decode/unpack the data. At last, surprisingly (or not), when I read again the persisted file (stored in float64), the scale_factor is not applied twice.
Am I missing any options there ? since DiskArrayTools seems to implement such mechanisms at reading/writing.
Many thanks in advance for your feedbacks! Vincent