polars icon indicating copy to clipboard operation
polars copied to clipboard

Add __slots__ to pl.Expr and pl.LazyFrame

Open Object905 opened this issue 1 year ago • 7 comments
trafficstars

Description

Profiled count of created pl.Expr and pl.LazyFrame in my use case (graphQL api on top of polars). Found that one of the heaviest queries creates pl.Expr around 3k times and pl.LazyFrame around 1k times.

Tried to "monkey patch" adding slots on them directly in .venv and found that it seems to be working ok.

Will follow up with PR later.

Object905 avatar Dec 22 '23 11:12 Object905

What will that do?

ritchie46 avatar Dec 22 '23 13:12 ritchie46

__slots__ improves the performance of class attribute access, sometimes by quite a bit. I tried playing with this in dataframe/series a while back and ran into issues with namespace registration, since using __slots__ means you can't dynamically add attributes to a class. To get around this, I had to create a separate namespace dict and override __getattr__ to detect namespace retrieval, and in doing so removed the performance benefit of using __slots__.

mcrumiller avatar Dec 22 '23 16:12 mcrumiller

Well, then registration of namespace would be better handled by using descriptors.

Like this. Will experiment on this once I get to PR.

class Namespace:
    def __get__(self, obj, objtype=None):
        return "FooNamespace"


class Foo:
    __slots__ = ("_bar",)


Foo.namespace = Namespace()

instance = Foo()
print(instance.namespace)  # prints FooNamespace
print(Foo.namespace) # prints FooNamespace too

Object905 avatar Dec 22 '23 17:12 Object905

Having __slots__ only means that you can't add attributes to instances, not class itself . Class object would still have __dict__.

Object905 avatar Dec 22 '23 17:12 Object905

NameSpace class that wraps plugins is already is a descriptor. And it seems like it's registered on classes in _create_namespace, and that should not interact with __dict__ of instances in any way. So, given that it goes to Expr.__dict__ I don't see why you would need to define __getattr__ on Expr.

@mcrumiller Am I missing something here?

Object905 avatar Dec 22 '23 18:12 Object905

@Object905 if it works, go for it! I may have done something incorrect, and this was a while ago so I don't recall the details. Are you talking about adding _expr and _ldf to the __slots__?

mcrumiller avatar Dec 22 '23 18:12 mcrumiller

Yes. Builtin namespaces, like str, list, etc also good candidates.

Object905 avatar Dec 22 '23 18:12 Object905

@Object905 what do you recommend for adding attributes to instances, given that after this change one can't use setattr?

knl avatar Mar 01 '24 19:03 knl

@knl I was just about to make a feature request for an empty attr dictionary.

mcrumiller avatar Mar 01 '24 19:03 mcrumiller