ibis slow performance of Concrete/Annotable base classes due to object.__setattr_

The Concrete and Annotable Classes in grounds.py, which I expect were written fairly early in the creation of ibis, appear to provide relatively slow performance and I'd like to ask what other approaches could be considered?

Benchmarking & further details below but for the "summary":

Using object.__setattr__ is ~14-15x slower than direct attribute access and ~4x slower than using setattr
Concrete/Annotable are bases for almost every ibis operation as they are bases for e.g. Value, Node
due to the way ibis works (constructing and then rewriting expression trees), many objects need to be created for each expression (for some complex expressions we have built, this can be 10-100K objects).
(My actual question 😄 ) What alternative approaches would the team be open to using to reduce this bottleneck? Some options that come to mind, I'm sure there are others I've not thought of
- 1. Use a "make instance immutable after __init__" type approach in Immutable so that setattr could be used in __init__ instead of object.__setattr__, which would give a 4x improvement (more memory though: one extra attribute per object). This probably needs the least rework of existing code (would affect Immutable, Annotable, Concrete, possibly Slotted).
  - (implementation detail: change the setattr method on instances after init to stop "standard" setattr access)
- 1. Make use of python dataclasses to provide possibly frozen annotatable objects with defaults - would need benchmarking to verify if actually faster than current approach. The type checking done by Annotable would still need to be incorporated. Its not clear to what extent dataclasses could be a drop in replacement for Concrete/Annotatble/Immutable, so might/might not need a wider refactor
- 1. Make use of the koerce package developed by @kszucs - see #10078
- 1. Make use of another similar package e.g. pydantic - would need benchmarking, evaluation of feature parity etc

I would be happy to work on putting together a PR for options a/b above if they were considered viable by the team. Would like to consult opinions of people currently contributing/maintaining e.g. @cpcloud , @NickCrews , @kszucs @deepyaman if you have time? For clarity, I worked on another couple of recent PRs under the account name "hottwaj" which I've now renamed to this one which resembles my actual name :) https://github.com/ibis-project/ibis/issues?q=is%3Apr+author%3AJonAnCla

[details] I came across this while digging into performance building relatively large queries (hundreds of columns, thousands of operations) which can take 5-10s to construct on my laptop. We run some relatively complex ETL operations "on wide but short" tables i.e. "small to medium" size data (typically 100-1000mb), and currently the ibis expression construction time is 10-50% of overall execution time. Not a deal breaker but would prefer it to be <5% if possible :)

A key bottleneck is this line of code which makes use of object.__setattr__ to set each attribute of subclasses of Concrete.
https://github.com/ibis-project/ibis/blob/main/ibis/common/grounds.py#L212

This setup is needed because Concrete is a subclass of Immutable which prevents "normal" setattr usage in order to make instances "immutable" and hashable (they aren't actually immutable, but "more difficult to mutate")

Unfortunately, using object.setattr to set attribute values on objects is ~15x slower than directly using setattr. See outputs and the code snippet its derived from below

Timings (my laptop, python 3.12, ubuntu 24.04)

Direct setattr: 10.2 ns per call
Using setattr: 37.2 ns per call (3.65x)
Object.__setattr__: 150.0 ns per call (14.72x vs direct, 4.04x vs setattr)

Timing code snippet

import timeit

class Foo:
    pass

n = 1000000
reps = 5
direct_setattr = sum(timeit.repeat('foo.a = 1', 
                     setup='foo = Foo()', globals={'Foo': Foo}, number=n))/reps
using_setattr = sum(timeit.repeat('setattr(foo, "a", 1)', 
                    setup='foo = Foo()', globals={'Foo': Foo}, number=n))/reps
object_setattr = sum(timeit.repeat('object.__setattr__(foo, "a", 1)', 
                     setup='foo = Foo()', globals={'object': object, 'Foo': Foo}, number=n))/reps
print(f"Direct setattr: {direct_setattr/n*1e9:.1f} ns per call")
print(f"Using setattr: {using_setattr/n*1e9:.1f} ns per call ({using_setattr/direct_setattr:.2f}x)")
print(f"Object.__setattr__: {object_setattr/n*1e9:.1f} ns per call ({object_setattr/direct_setattr:.2f}x)")

Sep 22 '25 10:09 JonAnCla

A bit frustrating, but some further investigation shows that its hard to make significant improvements as there is also overhead customising the init process which is significant for objects with few attributes

I tried following variants:

import dataclasses

def _prevent_settattr(self, name: str, value: Any) -> None:
    raise AttributeError("can't set attribute")
    
class DirectAttrAccess:
    def __init__(self, a: int, b: int) -> None:
        #print("DirectAttrAccess __init__")
        self.a = a
        self.b = b
        
class Setattr:
    def __init__(self, a: int, b: int) -> None:
        setattr(self, 'a', a)
        setattr(self, 'b', b)
    
class PostInitImmutable:
    def __init__(self, a: int, b: int) -> None:
        self.a = a
        self.b = b
        self.__setattr__ = _prevent_settattr
        
class PostInitImmutableSetattr:
    def __init__(self, a: int, b: int) -> None:
        setattr(self, 'a', a)
        setattr(self, 'b', b)
        self.__setattr__ = _prevent_settattr

class ImmutableDisabledSetattr:
    def __init__(self, a: int, b: int) -> None:
        object.__setattr__(self, 'a', a)
        object.__setattr__(self, 'b', b)
        
    __setattr__ = _prevent_settattr
                
@dataclasses.dataclass(frozen=True)
class FrozenDataclass:
    a: int
    b: int
    
@dataclasses.dataclass
class MutableDataclass:
    a: int
    b: int

Timing results:

DirectAttrAccess: 268.9 ns per call
Setattr: 418.2 ns per call (1.6x slower vs direct)
PostInitImmutable: 339.0 ns per call (1.3x vs direct, 0.8x vs setattr)
PostInitImmutableSetattr: 543.2 ns per call (2.0x vs direct, 1.3x vs setattr)
ImmutableDisabledSetattr: 860.0 ns per call (3.2x vs direct, 2.1x vs setattr)
FrozenDataclass: 906.4 ns per call (3.4x vs direct, 2.2x vs setattr)
MutableDataclass: 354.6 ns per call (1.3x vs direct, 0.8x vs setattr)

Timing code:

import timeit

n = 1000000 reps = 5 direct_attr_access = sum(timeit.repeat('cls(1, 2)', globals={'cls': DirectAttrAccess}, number=n))/reps setattr_class = sum(timeit.repeat('cls(1, 2)', globals={'cls': Setattr}, number=n))/reps post_init_immutable = sum(timeit.repeat('cls(1, 2)', globals={'cls': PostInitImmutable}, number=n))/reps post_init_immutable_setattr = sum(timeit.repeat('cls(1, 2)', globals={'cls': PostInitImmutableSetattr}, number=n))/reps immutable_disabled_setattr = sum(timeit.repeat('cls(1, 2)', globals={'cls': ImmutableDisabledSetattr}, number=n))/reps frozen_dataclass = sum(timeit.repeat('cls(1, 2)', globals={'cls': FrozenDataclass}, number=n))/reps mutable_dataclass = sum(timeit.repeat('cls(1, 2)', globals={'cls': MutableDataclass}, number=n))/reps

print(f"DirectAttrAccess: {direct_attr_access/n1e9:.1f} ns per call") print(f"Setattr: {setattr_class/n1e9:.1f} ns per call ({setattr_class/direct_attr_access:.2f}x)") print(f"PostInitImmutable: {post_init_immutable/n1e9:.1f} ns per call ({post_init_immutable/direct_attr_access:.2f}x vs direct, {post_init_immutable/setattr_class:.2f}x vs setattr)") print(f"PostInitImmutableSetattr: {post_init_immutable_setattr/n1e9:.1f} ns per call ({post_init_immutable_setattr/direct_attr_access:.2f}x vs direct, {post_init_immutable_setattr/setattr_class:.2f}x vs setattr)") print(f"ImmutableDisabledSetattr: {immutable_disabled_setattr/n1e9:.1f} ns per call ({immutable_disabled_setattr/direct_attr_access:.2f}x vs direct, {immutable_disabled_setattr/setattr_class:.2f}x vs setattr)") print(f"FrozenDataclass: {frozen_dataclass/n1e9:.1f} ns per call ({frozen_dataclass/direct_attr_access:.2f}x vs direct, {frozen_dataclass/setattr_class:.2f}x vs setattr)") print(f"MutableDataclass: {mutable_dataclass/n*1e9:.1f} ns per call ({mutable_dataclass/direct_attr_access:.2f}x vs direct, {mutable_dataclass/setattr_class:.2f}x vs setattr)")

In summary, improvement could be made by:

switching to PostInitImmutableSetattr (~0.3x faster improvement for a 2 attribute class).
- but controlling when "freezing" occurs might become a bit fragile: using metaclass.__call__ or cls.__new__ to freeze after __init__ lead to much worse performance (I didn't include those variants in timings above), so freezing must be done at the end of __init__ for any improvement. This would lead to difficulty setting up freezing where super().__init__ is used, as the "topmost init function" has to call the freezing function when finished
ditching immutability completely
- using MutableDataclass: 0.9x faster
- using Setattr: 0.5x faster

I also tried the above out on python 3.14 (as speed things like building objects, setting attributes change across versions - timings above were for 3.12) and found:

PostInitImmutableSetattr is 0.5x faster
MutableDataclass is 1.5x faster
Setattr is 0.8x faster

I expect that ditching immutability is not particularly desirable... I think that users who mess around with internal objects should probably not be surprised if things break or (more likely) do unexpected stuff, but I guess the current setup also helps provide internal checks that stuff within the library itself is correct...

Maybe another idea could be to make immutability optional so that users could disable it at their own risk? :)

Sep 22 '25 12:09 JonAnCla

Can you investigate the "shape" of the object tree that is created to get a sense of if there are a few guilty parties that are causing ~90% of the problem? eg you say you are using wide tables. Is that causing the branching factor of the tree to be high, so that even if the tree isn't very deep, it still has a lot of nodes? If so, then perhaps we could do some spot optimizations such as lazily creating the column objects, or perhaps we could skip that work altogether? Or perhaps there is one particular Ops class that we could specialize/optimize?

In general I can help support and review but I don't have the motivation to really dive into the guts of this. I think I would say that any solution must keep immutability by default. If it isn't that ugly maybe make it configurable, but I wouldn't be hopeful. I would also say that a .9x faster improvement isn't worth completely overhauling our implementation.

Sep 22 '25 15:09 NickCrews

We can't get rid of mutability as a contract, as most or all of the internals rely on immutability for hashing purposes.

However, we might be able to change the operating principle of using object.__setattr__ to a more performant option.

This would be analogous to creating a class like

class MyTuple(tuple):
    def __init__(self, x):
        self.x = x
    def __hash__(self):
        return hash((*self, self.x))

where users (and maybe internals) would have to be very careful not to mutate any attributes.

Sep 22 '25 15:09 cpcloud

Thanks both, I'll definitely do a bit more investigation

@NickCrews you mentioned whether there are some specific places that could be targetted. A couple of key places I have a suspicion about are:

ops.Field: for wide tables, N (N=number of cols) of these objects need to be created for each table-level operation (each mutate/select/join). Creating these very simple objects is comparatively expensive because they inherit from Immutable+Concrete as mentioned above. A possible candidate for specialisation
rewrite_project_input - this is called at the end of every .select or .mutate operation and involves selectively replacing some objects in the expression tree. For "simple" columns though it never does anything so skipping this if it can be cheaply detected as not needed would be helpful

I'll look into rewrite_project_input separately and in the meantime come back on this if I can see a not-too-messy path forward

Sep 22 '25 15:09 JonAnCla

sorry "a suspicion about" is wrong way to put it. Above are both places I see a bottleneck from profiling expression building

Sep 22 '25 15:09 JonAnCla

Thanks @JonAnCla for looking into it, it is an interesting topic!

Just a quick note that object.__setattr__ actually does a lookup for the method, so preferably we should use the method directly which gives a little speedup:

__object_setattr__ = object.__setattr__

class ImmutableDisabledSetattr:
    def __init__(self, a: int, b: int) -> None:
        object.__setattr__(self, 'a', a)
        object.__setattr__(self, 'b', b)
        
    __setattr__ = _prevent_settattr


class ImmutableDisabledObjectSetattr:
    def __init__(self, a: int, b: int) -> None:
        __object_setattr__(self, 'a', a)
        __object_setattr__(self, 'b', b)

    __setattr__ = _prevent_settattr

In [5]: %timeit ImmutableDisabledSetattr(1, 2)
177 ns ± 0.82 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [6]: %timeit ImmutableDisabledObjectSetattr(1, 2)
154 ns ± 0.876 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

I actually use this in koerce https://github.com/kszucs/koerce/blob/fa3f8dcfc56b798acf676a7dba310521e052439a/koerce/annots.py#L570-L573 where the transpiled CPython code makes the difference more pronounced.

I worked on koerce to actually address several challenges you mentioned above while keeping the exiting + extended + fixed behavior of ibis' core. I think it would be more reasonable to add object modeling optimizations to koerce itself since it provides additional benefits. Regarding pydantic koerce is actually twice as fast https://github.com/kszucs/koerce?tab=readme-ov-file#performance (at least at time of writing the readme).

We also do a lot of redundant traversing and object replacement during IR manipulation which we should probably rework to be more similar to other IR rewrite systems available e.g. in MLIR. This is complementary to the features available in koerce.

Sep 23 '25 14:09 kszucs

Thanks @kszucs I distilled out the following snippet from koerce to just cover "Immutable object" type code for comparison to the other examples above so I could benchmark it

%%cython

from typing import Any
import cython

from cython.cimports.cpython.object import PyObject_GenericSetAttr as __setattr__

def new_fast(cls: type, **kwargs: dict[str, Any]):
    this = cls.__new__(cls)
    for name, value in kwargs.items():
        __setattr__(this, name, value)
    return this
    
class Immutable:
    def __setattr__(self, name: str, value: Any) -> None:
        raise AttributeError("can't set attribute")

for timing

%%timeit
new_fast(Immutable, a=1, b=2)

I got 290ns per object which is a touch faster than what I got for a "MutableDataclass" and comparable with "DirectAttrAccess" (standard class with init and self.x=x setup) i.e. fastest in table above. This koerce/cython approach would have immutability which the other two don't.

Maybe if a way of making an install-time switch to use this could be made, without a lot of upheaval of internals, something like this could be considered for inclusion in ibis (i.e. option to have an "accelerated" build or pure python build)? though that might be quite a big maybe :) The rest of koerce seems like it would be a good performance improvement, but such a big change to these internals seems like it would be a big ask for maintainers at the moment?

What is pretty impressive having looked at all this is what msgspec can do, snippet follows

import msgspec

class MsgspecDataclass(msgspec.Struct, frozen=True):
    a: int
    b: int

timing

%%timeit
MsgspecDataclass(1, 2)

This timed at 90ns per object on same hardware, about 3 times faster than fastest other approach to building objects so far, and its immutable. So using msgspec for an immutable object replacement would be ~5x faster than current approach taken in ibis.

I've not used msgspec though before so don't its pitfalls - maybe someone else has thoughts. Could investigate with time if others consider it a viable option

Thanks!

Sep 24 '25 20:09 JonAnCla

@jcrist is the author of msgpec and a fellow maintainer of ibis, so we can directly ask him :)

Sep 26 '25 09:09 kszucs

slow performance of Concrete/Annotable base classes due to object.__setattr__

slow performance of Concrete/Annotable base classes due to object.setattr