datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Loosing attributes with .chunk and .pad

Open louisletoumelin opened this issue 1 year ago • 1 comments

Hello,

Attributes disappear after calling the following DataTree method: .chunk, .pad

datatree version: '0.0.13'

import numpy as np
import pandas as pd
import xarray as xr
from datatree import DataTree

np.random.seed(0)

temperature = 15 + 8 * np.random.randn(2, 3, 4)

precipitation = 10 * np.random.rand(2, 3, 4)

lon = [-99.83, -99.32]

lat = [42.25, 42.21]

instruments = ["manufac1", "manufac2", "manufac3"]

time = pd.date_range("2014-09-06", periods=4)

reference_time = pd.Timestamp("2014-09-05")

ds = xr.Dataset(

    data_vars=dict(

        temperature=(["loc", "instrument", "time"], temperature),

        precipitation=(["loc", "instrument", "time"], precipitation),

    ),

    coords=dict(

        lon=("loc", lon),

        lat=("loc", lat),

        instrument=instruments,

        time=time,

        reference_time=reference_time,

    ),

    attrs=dict(description="Weather related data."),

)
dt = DataTree.from_dict({"simulation/one": ds, "simulation/two": ds})
dt.attrs = {"a": 0, "b":1}
dt.pad({"loc": 1}).attrs  # returns {}, expected  {"a": 0, "b":1}
dt.chunk({"loc": 1}).attrs  # returns {}, expected  {"a": 0, "b":1}

I tried to replicate the behavior with datatree version's on the xarray repo. The .pad issue disappear: xarray version '2024.6.0'

from xarray.core.datatree import DataTree
# same operations as above
dt.pad({"loc": 1}).attrs  # OK

but the following operation raises an error

dt.chunk({"loc": 1}).attrs 

ValueError: unrecognized chunk manager dask - must be one of: [] Raised whilst mapping function over node with path /simulation/one

louisletoumelin avatar Jun 14 '24 09:06 louisletoumelin

Hi @louisletoumelin , thanks for raising this.

I tried to replicate the behavior with datatree version's on the xarray repo. The .pad issue disappear

In that case then unless you want to contribute a fix to this repo, I don't think there is anything to do here, because we're trying to archive this repo very soon.

ValueError: unrecognized chunk manager dask - must be one of: []
Raised whilst mapping function over node with path /simulation/one

Do you have dask installed? If so then this is a separate bug.

TomNicholas avatar Sep 07 '24 22:09 TomNicholas

The attributes issue should either be fixed or behave the same way as the rest of xarray upstream, and should be re-raised upstream if it's still not working.

The ChunkManger not found issue is independent of DataTree, and again should be re-raised upstream if it's still not working.

TomNicholas avatar Oct 08 '24 16:10 TomNicholas