xarray icon indicating copy to clipboard operation
xarray copied to clipboard

xarray.concat for datasets fails when object is a child class of Dataset

Open gabicca opened this issue 3 years ago • 3 comments

What happened?

The following error is thrown:

>       result = type(datasets[0])(result_vars, attrs=result_attrs)
E       TypeError: __init__() got an unexpected keyword argument 'attrs'

This is coming from the change merged as part of this PR: https://github.com/pydata/xarray/pull/6784/files (line 593)

When "datasets" is a list of class objects which are the child of Dataset, type(datasets[0]) will return that class as the type. This class however, doesn't necessary has attrs in its own init, hence the code breaks.

I'm not convinced that this is a correct behaviour, and if you want to initialize the Dataset class, you should revert that line of change.

What did you expect to happen?

Code to run without any errors.

Minimal Complete Verifiable Example

import xarray

class MyDataset(xarray.Dataset):
    __slots__ = ()

    def __init__(self, a, b, c, d=None):
        super().__init__()

        attrs = {
            'at1': b,
            'at2': d
        }

        super().__init__(a, coords=c, attrs=attrs)


if __name__ == '__main__':
    a = {
        "x": 1,
        "y": 3
    }
    b = ["x", "y"]
    c = {
        "coord1": 0,
        "coord2": 1
    }
    ds1 = MyDataset(a, b, c)
    ds2 = MyDataset(a, b, c)
    ds3 = MyDataset(a, b, c)

    xarray.concat([ds1, ds2, ds3], dim="coord1")

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray==2022.6.0

python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:27:05)

gabicca avatar Jul 26 '22 12:07 gabicca

In the example the naming and meaning of the arguments of the __init__ method are different from the Dataset base class.

Subclassing is often more reliable when it adheres to the Liskov Substitution Principle which states that objects of a superclass should be replaceable with objects of its subclasses and in other words: a subclass should only add functionality and not reduce it.

There are some other workarounds though:

If all you need is a convenient custom constructor then you could add a classmethod factory method to your subclass or even just use a plain function which returns a Dataset.

Another option is to convert your MyDataset instances to plain Dataset instances before passing them into xarray.concat, e.g. xarray.concat([ds.to_dataset() for ds in [ds1, ds2, ds3]], dim="coord1")

where your subclass may define as a convenience method:

def to_dataset(self):
    """Convert to plain Dataset."""
    return xarray.Dataset(self, attrs=self.attrs)

rhkleijn avatar Jul 26 '22 17:07 rhkleijn

In the example the naming and meaning of the arguments of the __init__ method are different from the Dataset base class.

Subclassing is often more reliable when it adheres to the Liskov Substitution Principle which states that objects of a superclass should be replaceable with objects of its subclasses and in other words: a subclass should only add functionality and not reduce it.

There are some other workarounds though:

If all you need is a convenient custom constructor then you could add a classmethod factory method to your subclass or even just use a plain function which returns a Dataset.

Another option is to convert your MyDataset instances to plain Dataset instances before passing them into xarray.concat, e.g. xarray.concat([ds.to_dataset() for ds in [ds1, ds2, ds3]], dim="coord1")

where your subclass may define as a convenience method:

def to_dataset(self):
    """Convert to plain Dataset."""
    return xarray.Dataset(self, attrs=self.attrs)

Thank you for the answer. I'll look into implementing one of your suggestions (they're much better than what I had in mind).

I still don't understand the change that causes the error, though. If you want to instantiate a Dataset object, which you clearly do according to the args you pass in, why did that line have to be changed from explicitly calling Dataset? (By "you" I don't mean you personally, unless you actually made the change :) )

gabicca avatar Jul 27 '22 10:07 gabicca

Because we want to get out the same type of Dataset (subclassed or not) as we input. That's what the T_Dataset type implies. And since we forced a Dataset at the end that wasn't the case.

Illviljan avatar Jul 28 '22 19:07 Illviljan

Shall we close? It seems like the current code is what we want.

dcherian avatar Aug 15 '22 16:08 dcherian

I'll leave that with you guys to decide. I've reported it, it's been looked at, and it has a record of it. So if the code is as you intended, I don't mind if the issue is closed.

gabicca avatar Aug 19 '22 10:08 gabicca

@gabicca given that we do say in our docs that we don't recommend subclassing xarray objects, I'm going to close this issue as a downstream "subclass at your own risk" problem. If you would like it to be easier to subclass, we would welcome your input here.

TomNicholas avatar Jan 13 '23 17:01 TomNicholas