attrs Caching dynamic properties when using `slots=True`.

Hey there!

This is more of a question than an issue :-)

Some of the classes that I'm trying to migrate to attrs have attributes that are expensive to compute. These attributes are either never accessed or accessed multiple times in quick succession. In those cases, I prefer delaying the cost of computing the attribute in case it is not needed -- but I also never want to compute the attribute twice for the same instance. I use pydanny/cached-property for this, which works fine until you use slots=True.

Any advice?

Mar 06 '17 14:03 AndreLouisCaron

I've run into this as well. Unfortunately there's no easy solution as of yet. My advice for now is just use dict-based classes if you need cached properties.

Mar 07 '17 13:03 Tinche

Can’t we implement something ourselves? Technically it shouldn’t be a big of a deal: private var + property in front of it?

Mar 07 '17 14:03 hynek

Well, yes, of course. Let's leave this open to track the issue.

Consider the elegance of cached properties when applied to dict classes. Suppose you have a class A and an instance of it, a. There's a cached property p on A.

The first time you do a.p, Python looks in a.__dict__, finds nothing, then looks in A.__dict__, finds the cached property descriptor and executes it. The descriptor calculates the value and puts it in a.__dict__.

Next time you do a.p, Python looks in a.__dict__ and finds the calculated value. So subsequent lookups are actually faster. Also you can trivially clear the cache with del a.__dict__['p'].

This is really clean and lends itself well to being refactored out into a simple library, which is what we have today.

Mar 07 '17 17:03 Tinche

Absolutely. I think we can simulate that using a simple placeholder class that does almost the same. It might make sense to have a blessed way for that that would also work with immutable classes even though it's a bit iffy. (not arguing, just sketching :))

Mar 07 '17 17:03 hynek

I like where this is going! I played around a bit with a cache_property desriptor and __slots__ today only to realize what the absence of __dict__ really means... (hint: it breaks the lookup mechanism that @Tinche described earlier).

If someone can sketch a basic structure that would work, I can put some work into a patch :-)

Mar 09 '17 04:03 AndreLouisCaron

fyi a class can be both slotted, and have a __dict__ slot

class Ham:
    __slots__ = ["__dict__", "spam"]

    def __init__(self):
        self.__dict__ = {"spam": 3}
        self.eggs = 1
        self.spam = 2


print(f"{Ham().eggs=}, {Ham().spam=}, {Ham().__dict__=}")

It seems you get the speedup for accessing slotted attributes

Oct 08 '20 10:10 graingert

~you can even replace the self.__dict__ slot with a slotted dict-like object that proxies its attrs - eg for all the attrs that have @cachedproperty use~

class _Ham(dict):
    __slots__ = ["ham", "spam", "eggs"]

    def __getitem__(self, key):
        return getattr(self, key)

    def __setitem__(self, key, value):
        return setattr(self, key, value)


class Ham:
    __slots__ = ["__dict__", "spam"]

    def __init__(self):
        self.__dict__ = _Ham()
        self.eggs = 1
        self.spam = 2


print(f"{Ham().eggs=}, {Ham().spam=}, {Ham().__dict__=}")

edit: ah this doesn't work python cheats and accesses the self.__dict__'s data directly

Oct 08 '20 10:10 graingert

fwiw, functools has .cached_property() these days. it works using a normal attribute and a fancy descriptor to populate it when it's first accessed. that means zero overhead the 2nd time it's accessed.

Nov 22 '21 10:11 wbolster

Yeah, the problem is that it doesn't work with slotted classes:

from functools import cached_property
import time
import attr


@attr.define
class A:
    @cached_property
    def x(self):
        return 42

a = A()

print(a.x)

throws:

Traceback (most recent call last):
  File "/Users/hynek/FOSS/attrs/t.py", line 14, in <module>
    print(a.x)
  File "/Users/hynek/.asdf/installs/python/3.10.0/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/functools.py", line 963, in __get__
    raise TypeError(msg) from None
TypeError: No '__dict__' attribute on 'A' instance to cache 'x' property.

Guess it could work if it stored the result in the decorator instead of the class?

Nov 22 '21 15:11 hynek

Does the stdlib cached property work with normal (non-attrs) slotted classes?

Nov 22 '21 15:11 Tinche

Does the stdlib cached property work with normal (non-attrs) slotted classes?

nope:

import functools

class Ham:
    __slots__ = "eggs", "__weakref__"

    def __init__(self, eggs):
        self.eggs = eggs

    @functools.cached_property
    def x(self):
        return 42

spam = Ham("example")
print(spam.eggs)
print(spam.x)

example
Traceback (most recent call last):
  File "foo.py", line 15, in <module>
    print(spam.x)
  File "/usr/lib/python3.8/functools.py", line 960, in __get__
    raise TypeError(msg) from None
TypeError: No '__dict__' attribute on 'Ham' instance to cache 'x' property.

Nov 22 '21 15:11 graingert

the cached property in the stdlib used __dict__ as storage space there would be a need for a variant that uses a slot and/or a identity weak mapping in the descriptor

Nov 22 '21 15:11 RonnyPfannschmidt

Yeah this is not an attrs bug, but something where attrs might shine.

Nov 22 '21 16:11 hynek

the cached property in the stdlib used __dict__ as storage space

not sure if this is entirely correct. functools.cached_property uses a normal instance attribute as its storage space. for non-slotted classes, that is inside .__dict__ indeed.

this means cached_property will only trigger once in normal situations: as a fallback when the attribute is not set. after running the function once, the result is set as the attribute value. subsequent accesses do not flow through cached_property anymore, which means the the cost of the property indirection is payed only once. clearing the cache is achieved by deleting the attribute using del x.y: it will cause the next access to go through the fallback path again, via a descriptor.

that said, slotted classes can also have (slotted) attributes that are not set:

>>> class C:
...     __slots__ = ('a',)
... 
>>> c = C()
>>> hasattr(c, 'a')  # Not yet set
False
>>> c.a = 12
>>> hasattr(c, 'a')  # Now it is set
True
>>> del c.a
>>> c.a  # Now it it unset again
Traceback (most recent call last):
[...]
AttributeError: a

however, trying to combine that with cached_property fails like this:

>>> import functools
>>> class C:
...     __slots__ = ('a',)
...     @functools.cached_property
...     def a(self):
...         return 123
... 
Traceback (most recent call last):
[...]
ValueError: 'a' in __slots__ conflicts with class variable

not sure if there's something that can be done about this 🤷

Nov 22 '21 17:11 wbolster

functools.cached_property can be used in slotted classes if __dict__ is added as a field:

import functools
import attrs


@attrs.define()
class A:
    __dict__: dict = attrs.field(default=attrs.Factory(dict), init=False, repr=False, eq=False)

    @functools.cached_property
    def x(self):
        return 42


a = A()
a.__dict__  # {}
a.x  # 42
a.__dict__  # {'x': 42}

It works also for frozen slotted classes (replace attrs.define by attrs.frozen).

A (possibly undesired) side effect of adding __dict__ to a slotted class is that monkeypatching is enabled again. E.g. following simply works:

a.y = "test"

In case such a side effect is undesired, one may opt to add the field __cache__ instead of __dict__, while adapting cached_property such that it uses __cache__ instead of __dict__ to cache properties.

Aug 31 '22 17:08 kdebrab

I've been playing around with possibilities for getting some kind of cached property / lazy value on slotted (attrs) classes.

For my use case, I'd prefer to avoid adding a dictionary to the class, to keep the memory usage small, so adding in __dict__ isn't very appealing.

It would be great if attrs provided a way to do this behind the scenes, with something like:

@attrs.frozen
class X:
    x: float
    y: float
    z: float = attrs.field(lazy_value = lambda self: self.x + self.y)

and/or

@attrs.frozen
class X:
    x: float
    y: float
    z: float = attrs.field(init=False)

    @z.lazy_value
    def _z(self):
        return self.x + self.y

Or perhaps just by detecting cached_property objects on slotted classes, and adding the slot + whichever backend solution is preferred. That would make the typical user completely ignorant that this behaviour exists, at the expense of tying tightly to functools.cached_property.

Anyway, I've come across 2 general approaches that seem promising. The first, option-A, is through __getattr__:

import attrs

calls = []

@attrs.frozen
class A:
    x: float
    y: float
    z: float = attrs.field(init=False)

    def _z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y

    __lazy_values = {
        'z': _z
    }

    def __getattr__(self, item):
        if item in self.__lazy_values:
            result = self.__lazy_values[item](self)
            object.__setattr__(self, item, result)
            return result
        raise AttributeError(item)


a = A(10, 15)
assert a.x == 10
assert a.y == 15
assert a.z == 25
assert a.z == 25
assert a.z == 25

assert len(calls) == 1
assert calls == ['A']

The second, option-B, is through a custom descriptor for the field:

from typing import Callable
import attrs


@attrs.frozen
class B:
    x: float
    y: float
    z: float = attrs.field(init=False)

    def _z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y


@attrs.frozen
class LazyDescriptor:
    _old_descriptor: Callable
    _callable: Callable

    def __get__(self, instance, owner):
        try:
            return self._old_descriptor.__get__(instance, owner)
        except AttributeError:
            result = self._callable(instance)
            self._old_descriptor.__set__(instance, result)
            return result

    def __set__(self, instance, value):
        return self._old_descriptor.__set__(instance, value)

    def __delete__(self, instance):
         return self._old_descriptor.__delete__(instance)

B.z = LazyDescriptor(old_descriptor=B.z, callable=B._z)

calls = []
b = B(10, 15)
assert b.x == 10
assert b.y == 15
assert b.z == 25
assert b.z == 25
assert b.z == 25

assert len(calls) == 1
assert calls == ['B']

Of these, option-A has the advantage of not adding overhead except:

when calculating the cached value
when attempting to access non-existent attributes

Neither of which seems unreasonable.

It does mean adding/modifying __getattr__, which attrs doesn't appear to currently do, but __setattr__ is modified, so it's plausibly acceptable.

It also works the same way for non-slotted classes, so would be potentially easier to keep consistent.

Option-B in contrast adds consistent overhead for accessing the cached value (i.e. even after calculating it), but doesn't spill any overhead or complexity onto anything other than the cached value. It doesn't work out of the box for non-slotted classes, so an alternate would need to be found in that case, but that shouldn't be too difficult. It's also closer in implementation to functools.cached_property which may be advantageous.

I'd be happy to take a stab at an MR for something like this when I get some time, if either of the approaches would be considered acceptable, and the interface can be roughly agreed.

Sep 13 '22 16:09 diabolo-dan

yeah B is too complicated and overhead's not cool.

I think I could live with an option-A-based solution, but maybe start with a PoC and take it from there.

A few notes:

It's fair game to detect a custom __getattr__ and error out.
the attribute's name should be __attrs_cached_properties__ or something similar.
and finally and most importantly: would it be possible to make it work along with @property?

It's an extra line, but I suspect it's a) clearer what's going on and b) less offensive to type checkers.

IOW I was thinking something along the lines of:

@attrs.define
class C:
    z: int = attrs.field(init=False).  # init=False would be optional but might mollify pyright etc

    @z.cached
    @property
    def _z(self) -> int:
        return

But maybe it's possible to type an Attribute.lazy_property in a way that type checkers would swallow it too?

Sep 14 '22 07:09 hynek

What are your thoughts on picking up cached_property directly? It's the most invisible to the typical user, and should play reasonably with type checkers etc. I guess it might cause surprise if someone is relying on specifics of the cached_property on the class level, but someone doing that should be able to work out what's going on.

I also realised this could actually be done outsids of attrs , as a pre-step with an additional decorator:

T = TypeVar('T', bound=type)
def convert_cached_property_for_attrs_slots(cls: T) -> T:
    cached_properties = {name: cached_property.func for name, cached_property in cls.__dict__.items() if isinstance(cached_property, functools.cached_property)}
    original_getattr = cls.__dict__.get('__getattr__')

    def __getattr__(instance, item):
        if item in cached_properties:
            result = cached_properties[item](instance)
            object.__setattr__(instance, item, result)
            return result
        elif original_getattr is not None:
            return original_getattr(instance, item)
        else:
            raise AttributeError(item)

    for name in cached_properties:
        setattr(cls, name,  attrs.field(init=False, hash=False, eq=False, repr=False, order=False))
    setattr(cls, '__getattr__',   __getattr__)
    return cls

Applied before attrs:

@attrs.frozen
@convert_cached_property_for_attrs_slots
class C:
    x: float = attrs.field()
    y: float = attrs.field()

    @cached_property
    def z(self) -> float:
        calls.append(self.__class__.__name__)
        return self.x + self.y


calls = []
c = C(10, 15)
assert c.x == 10
assert c.y == 15
assert c.z == 25
assert c.z == 25
assert c.z == 25


assert len(calls) == 1
assert calls == ['C']

I don't mean this as a long term solution, but it allows some exploration of the approach before touching attrs itself, and maybe unblocks others stumbling on this thread.

(This version is also obviously not thread safe, and it would make sense to use an RLock in the same way as functools.cached_property)

Sep 14 '22 09:09 diabolo-dan

One quick fix that I'm happy with is allowing attrs.define to keep exising __slots__: right now, if slots=True, attrs overwrites the class's __slots__ instead of updating. If it were to update, I could specify __slots__ = ("__dict__",) and cached-properties would work again (I don't mind the performance penalty of having a __dict__).

Sep 19 '22 04:09 EpicWink

It seems that my proposed decorator only works when all the existing attributes are explicitly marked as = attrs.field(). I think because attrs will only infer when all attributes have type hints.

Adding in a:

cls.__annotations__[name] = inspect.signature(
            cached_properties[name]
        ).return_annotation

seems to resolve this, at least so long as the cached_property has a return annotation. I guess one could do:

        if isinstance(annotation, inspect.Parameter.empty):
            cls.__annotations__[name] = annotation

for robustness.

Sep 22 '22 17:09 diabolo-dan

Btw, as properties /cached properties don't support slots as well, why not simply require the use of a cached property decorator that uses a alternative cache attribute name

Sep 22 '22 17:09 RonnyPfannschmidt

It is maybe naive, but for unknown reason, inheriting from a class is fixing this behavior :

#!/usr/bin/env python3
from functools import cached_property
from attr import define

class Works:
    pass


@define(frozen=True)
class Whatever(Works):
    @cached_property
    def it_works(self) -> int:
        return 4


t = Whatever()
print(t.it_works)

print(t.__dict__)
print(t.__slots__)

4
{'it_works': 4}
()

Jan 04 '23 21:01 AdrienPensart

@AdrienPensart well you have a __dict__ there...

Jan 25 '23 10:01 nrbnlulu

slots only works as intended when all base classes use it as ell, else its mood and shouldn't e used to begin with ...

Jan 25 '23 12:01 RonnyPfannschmidt

Ran into this issue again, and passing @attrs.define(slots=False) fixes things but attrs really needs a nice way to support this case and preserve slots. I was expecting something along the lines of the following:

@attrs.define
class Foo:
    x: int

    @attrs.lazy_field
    @cached_property
    def double_x(self) -> int:
        return self.x * 2

foo = Foo(x=5)
print(foo.double_x)  # 10

Feb 27 '23 23:02 mmerickel

If we add another decorator, couldn't it be @attrs.cached_property that handles all edge cases correctly instead of requiring two decorators, one of which we don't control? Like serious question, would it be a lot work to reimplement or at least wrap?

Mar 11 '23 10:03 hynek

Depends on whether one wants the cached property to be thread safe

However it's still going to require a rename of the field as to avoid having to deal with shadowed or hidden slot descriptors

Mar 11 '23 11:03 RonnyPfannschmidt

I found some time to put together a proof of concept of an approach to this: https://github.com/python-attrs/attrs/pull/1200

This works similarly to the decorator I posted above, in that it picks up use of the functools cached_decorator implementation, and converts it and the class to work in a similar way.

It would be good to get some feedback on whether this seems like an acceptable approach at a high level. The advantage of basing on the functools cached property is that it will just work for most people, and it potentially doesn't require any API changes. An argument against is that it's makes the conversion hidden, which could be surprising, but pragmatically, that doesn't seem like it would be a common problem.

There's also an issue with locking, because the current approach is per-class, which is unlikely to be acceptable in most cases. It would be good to get some feedback on what the best approach would be here. The options I see are:

Class based locking (current approach), inefficient for many objects across many threads.
No locking: Simplest, lowest overhead, potential for duplicated work (and potential races).
Object based locking: add a lock to each object during init -- extra overhead at construction, permanent extra memory consumption per object.
Some kind of key based lock on the class, keyed on instances: Still some overhead, still some class based locking (but with much more limited duration), no standard implementation so probably needs extra code in the repo.

Nov 07 '23 17:11 diabolo-dan

Does attrs deal with locking in any other area? If so, this strategy should conform to that pattern. However, I think it does not - thus I would expect this to also not handle locking either, leaving room to grow out a full plan across all of attrs in the future consistently. Note that in Python 3.12+ the locking has been removed from cached_property, and this should not try to bring it back.

In use cases that I think are most common, you'd want per-object locks, never per-class locks. And in those cases I'd expect to declare the entire attrs class that way, not just the cached_property but any custom setters as well.

Nov 07 '23 18:11 mmerickel

Does attrs deal with locking in any other area? If so, this strategy should conform to that pattern. However, I think it does not - thus I would expect this to also not handle locking either, leaving room to grow out a full plan across all of attrs in the future consistently.

It doesn't that I could find, my concern was in matching cached_property as closely as possible, to minimise potential surprise.

Note that in Python 3.12+ the locking has been removed from cached_property, and this should not try to bring it back.

Ah, I hadn't realised that. I think that makes a fairly good argument in favour of leaving the locking out (along with the point about inconsistency with attrs).

Nov 07 '23 18:11 diabolo-dan

attrs attrs copied to clipboard

Caching dynamic properties when using `slots=True`.

attrs
attrs copied to clipboard