YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

Persisting arrays on disk using zarr backend does not encode data using scale_factor/offset

Open vlevasseur073 opened this issue 9 months ago • 1 comments

Hello, I am processing data stored in zarr format. Most of it are stored on disk encoded following CF conventions. For instance I have a YAXArray

┌ 1500×1200 YAXArray{Union{Missing, Float64}, 2} ┐
├────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ columns Sampled{Int64} 1:1500 ForwardOrdered Regular Points,
  → rows    Sampled{Int64} 1:1200 ForwardOrdered Regular Points
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 10 entries:
  "units"         => "K"
  "name"          => "s7_bt_in"
  "coordinates"   => "latitude longitude x y"
  "short_name"    => "s7_bt_in"
  "add_offset"    => 283.73
  "long_name"     => "gridded pixel brightness temperature for channel s7 (1km TIR grid, nadir view)"
  "missing_value" => -32768
  "scale_factor"  => 0.01
  "standard_name" => "toa_brightness_temperature"
  "_FillValue"    => -32768
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── loaded lazily ┤
  data size: 13.73 MB
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The underlying data is a CFDiskArray highlighting that the original zarray is Int16

1500×1200 DiskArrayTools.CFDiskArray{Union{Missing, Float64}, 2, Int16, ZArray{Int16, 2, Zarr.BloscCompressor, Zarr.ConsolidatedStore{DirectoryStore}}, Float64}

Chunked: (
    [1500]
    [1200]
)

When I save the data on disk using savecube or savedataset the array is stored as float64, and not encoded/packed in Int16.

For information, the original YAXArray is created with open_dataset from an existing zarr file and correctly decode/unpack the data. At last, surprisingly (or not), when I read again the persisted file (stored in float64), the scale_factor is not applied twice.

Am I missing any options there ? since DiskArrayTools seems to implement such mechanisms at reading/writing.

Many thanks in advance for your feedbacks! Vincent

vlevasseur073 avatar Feb 27 '25 21:02 vlevasseur073