xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Allow .attrs to support any dict-likes

Open Illviljan opened this issue 2 years ago • 10 comments

  • [x] Closes #5655
  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

Illviljan avatar Aug 03 '21 08:08 Illviljan

Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-10-31 09:57:53 UTC

pep8speaks avatar Aug 03 '21 08:08 pep8speaks

Unit Test Results

         6 files           6 suites   56m 18s :stopwatch: 16 290 tests 14 550 :heavy_check_mark: 1 739 :zzz: 1 :x: 90 936 runs  82 737 :heavy_check_mark: 8 198 :zzz: 1 :x:

For more details on these failures, see this check.

Results for commit 650ce229.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Aug 03 '21 08:08 github-actions[bot]

Some fun performance comparisons related to copying and initializing dicts:

a = dict(a=2, b=3)

%timeit dict(a)
207 ns ± 3.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit a.copy()
82.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

import copy
%timeit copy.copy(a)
313 ns ± 3.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

from copy import copy
%timeit copy(a)
290 ns ± 3.63 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

from copy import deepcopy
%timeit deepcopy(a)
3.39 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using a.copy() seems to be the way to go if you want to do a shallow copy of a dict.

Illviljan avatar Aug 05 '21 06:08 Illviljan

Using a.copy() seems to be the way to go if you want to do a shallow copy of a dict.

That is an interesting result (even aside from the main result here, I'm not sure what python is doing such that copy.copy(a) has different performance from copy(a)!)

But python is slow, and nanos are short — unless there's a noticeable impact on the overall performance, then prioritizing flexibility and compatibility are more consistent with the goals of the library. Does that make sense?

max-sixty avatar Aug 06 '21 16:08 max-sixty

I'm surprised about the deepcopy being so slow too, I thought it would be similar in speed in this case and just increase if dealing with mutable objects.

But using .copy is 100% compatible with how attrs has behaved before. So we come back to the question what type attrs should be? The options I see is going with dict or MutableMapping.

I'm starting to lean towards mutablemapping because subclassing dict has been rather difficult compared to mutablemapping.

And if we go with mutablemapping then we should use copy.copy.

Illviljan avatar Aug 06 '21 19:08 Illviljan

Think I'm running into https://github.com/python/mypy/issues/3004. Not completely sure why this worked before though.

Illviljan avatar Aug 07 '21 10:08 Illviljan

Is python/mypy#3004 still an issue?

pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a MutableMapping in that function?

max-sixty avatar Aug 19 '21 22:08 max-sixty

Is python/mypy#3004 still an issue?

pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a MutableMapping in that function?

Yes, it is still an issue. I've cheated though and used type: ignore on a few places, that's why its been passing the checks.

Mappings (in the form of FrozenDict) are used to initialize .attrs in xr.open_dataset for example. So we can't unfortunately accept MutableMappings only.

Does pyright handle properties with setters and getters?

Illviljan avatar Aug 20 '21 06:08 Illviljan

Here's further tests to check how fast different class checkers are:

from typing import MutableMapping


class Test2(MutableMapping):
    def __init__(self, *args, **kwargs):
        self.data = dict(*args, **kwargs)

    def __getitem__(self, key):
        pass

    def __setitem__(self, key, value):
        pass

    def __delitem__(self, key):
        pass

    def __iter__(self):
        pass

    def __len__(self):
        pass

b = Test2()

%timeit issubclass(type(b), MutableMapping)
711 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit isinstance(b, MutableMapping)
853 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# If you want to get really fast you can check for one of the required attributes MutableMapping has 
%timeit hasattr(b, "update")
82.6 ns ± 0.181 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

isinstance is rather slow as can be seen. Considering just doing dict(b) takes about 200ns which is basically what the original implementation was it doesn't feel that good to add a check that adds 800ns of wait time.

Illviljan avatar Aug 20 '21 07:08 Illviljan

If that is still an open issue we could merge current main, try to fix the resulting typing problems.

headtr1ck avatar Oct 12 '22 18:10 headtr1ck