xarray
xarray copied to clipboard
Allow .attrs to support any dict-likes
- [x] Closes #5655
- [ ] Tests added
- [ ] Passes
pre-commit run --all-files
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
- [ ] New functions/methods are listed in
api.rst
Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2021-10-31 09:57:53 UTC
Unit Test Results
6 files 6 suites 56m 18s :stopwatch: 16 290 tests 14 550 :heavy_check_mark: 1 739 :zzz: 1 :x: 90 936 runs 82 737 :heavy_check_mark: 8 198 :zzz: 1 :x:
For more details on these failures, see this check.
Results for commit 650ce229.
:recycle: This comment has been updated with latest results.
Some fun performance comparisons related to copying and initializing dicts:
a = dict(a=2, b=3)
%timeit dict(a)
207 ns ± 3.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit a.copy()
82.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
import copy
%timeit copy.copy(a)
313 ns ± 3.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
from copy import copy
%timeit copy(a)
290 ns ± 3.63 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
from copy import deepcopy
%timeit deepcopy(a)
3.39 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Using a.copy()
seems to be the way to go if you want to do a shallow copy of a dict.
Using a.copy() seems to be the way to go if you want to do a shallow copy of a dict.
That is an interesting result (even aside from the main result here, I'm not sure what python is doing such that copy.copy(a)
has different performance from copy(a)
!)
But python is slow, and nanos are short — unless there's a noticeable impact on the overall performance, then prioritizing flexibility and compatibility are more consistent with the goals of the library. Does that make sense?
I'm surprised about the deepcopy being so slow too, I thought it would be similar in speed in this case and just increase if dealing with mutable objects.
But using .copy is 100% compatible with how attrs
has behaved before. So we come back to the question what type attrs
should be?
The options I see is going with dict or MutableMapping.
I'm starting to lean towards mutablemapping because subclassing dict has been rather difficult compared to mutablemapping.
And if we go with mutablemapping then we should use copy.copy.
Think I'm running into https://github.com/python/mypy/issues/3004. Not completely sure why this worked before though.
Is python/mypy#3004 still an issue?
pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a MutableMapping
in that function?
Is python/mypy#3004 still an issue?
pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a
MutableMapping
in that function?
Yes, it is still an issue. I've cheated though and used type: ignore
on a few places, that's why its been passing the checks.
Mappings
(in the form of FrozenDict)
are used to initialize .attrs
in xr.open_dataset
for example. So we can't unfortunately accept MutableMappings
only.
Does pyright
handle properties with setters and getters?
Here's further tests to check how fast different class checkers are:
from typing import MutableMapping
class Test2(MutableMapping):
def __init__(self, *args, **kwargs):
self.data = dict(*args, **kwargs)
def __getitem__(self, key):
pass
def __setitem__(self, key, value):
pass
def __delitem__(self, key):
pass
def __iter__(self):
pass
def __len__(self):
pass
b = Test2()
%timeit issubclass(type(b), MutableMapping)
711 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit isinstance(b, MutableMapping)
853 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# If you want to get really fast you can check for one of the required attributes MutableMapping has
%timeit hasattr(b, "update")
82.6 ns ± 0.181 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
isinstance
is rather slow as can be seen. Considering just doing dict(b)
takes about 200ns which is basically what the original implementation was it doesn't feel that good to add a check that adds 800ns of wait time.
If that is still an open issue we could merge current main, try to fix the resulting typing problems.