polars perf(python): add __slots__ to pl.Expr, pl.LazyFrame, built in namespaces, various internal structures and some dtypes.

Closes https://github.com/pola-rs/polars/issues/13198

In this PR are "safe" additions of __slots__. Mostly for classes that are not usually subclassed by users - pl.Expr, its subclasses and build in namespaces. polars.datatypes.classes also really good candidates, since they are instantiated now, but not included in this PR (yet?).

Adding __slots__ for LazyFrame (and DataFrame/Series) turned out to be a breaking change because of subclassing. Using ldf.__class__ = SubClassedLazyFrame, like in test_preservation_of_subclasses is an error because of different object layouts. And even if subclass were to define __slots__ - they can only be empty, without additional attributes on instance to keep this assigment valid. That can't be solved without specific conversion mechanism to subclasses to properly create a new instance (like pl.LazyFrame.as_subclass(SubClassedLazyFrame, *args, **kwargs)) where _ldf attribute itself would be passed/assigned to new instance of subclass. Also using ldf.__class__ = SubClassedLazyFrame a dirty hack anyway and if polars were to properly support subclassing of frames, I think something similar should be in place - then __slots__ on them can be added safely.

Deeper description about __slots__ and benefits\downsides can be found here. Also in issue were not mentioned memory usage benefits aside from slightly faster attribute access. (sqlalchemy claimed 46% less memory usage for it's internal structures). Even though in latest python versions this gap should be smaller, due to more compact dict.

Dec 24 '23 09:12 Object905

Thanks for the PR. Could you provide some data on the performance impact of this change? I think I'm in favor of this change, but it would be nice to have some numbers to support it.

About subclassing: we don't support this. That test is from a while ago and we no longer guarantee preserving the class on calling a method.

Dec 26 '23 11:12 stinodego

It's hard to measure exactly, because of variety of possible use cases. But speedup is small, which is anticipated.

In my production use case constructing pl.Expr and pl.LazyFrame with all the glue takes ~20% of request runtime, while collecting them only about 10% because dataframes are pretty small (100-2000 rows). And given that this change is sadly close to noise.

Anyway here is some synthetic bench.

upstream

In [5]: %timeit -n 5000 pl.when(pl.col("foo") == 1).then("bar").when(pl.col("foo") == 2).then(pl.col("bar") / 2).when(pl.col("bar") == p
   ...: l.col("foo")).then("baz").otherwise(pl.col("bar").list.eval(pl.element().str.starts_with("ni")))
34.4 µs ± 480 ns per loop (mean ± std. dev. of 7 runs, 5,000 loops each)

In [2]: %timeit -n 10000 -r 100 pl.col("foo").alias("bar")
1.41 µs ± 43.8 ns per loop (mean ± std. dev. of 100 runs, 10,000 loops each)

In [3]: sys.getsizeof(pl.col.foo)
Out[3]: 48

this PR

In [7]: %timeit -n 5000 pl.when(pl.col("foo") == 1).then("bar").when(pl.col("foo") == 2).then(pl.col("bar") / 2).when(pl.col("bar") == p
   ...: l.col("foo")).then("baz").otherwise(pl.col("bar").list.eval(pl.element().str.starts_with("ni")))
33.5 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 5,000 loops each)

In [3]: %timeit -n 10000 -r 100 pl.col("foo").alias("bar")
1.33 µs ± 41.3 ns per loop (mean ± std. dev. of 100 runs, 10,000 loops each)

In [3]: sys.getsizeof(pl.col.foo)
Out[3]: 40

Also docs are failing to build because of this, because sphinx tries to get __dict__ on namespaces to check for classmethod/staticmethod. If it's acceptable, I will add workaround for docs and __slots__ to other wrappers (LazyFrame/DataFrame/Series) later, otherwise might as well close this PR.

Dec 26 '23 18:12 Object905

Ready to be merged or closed :neutral_face:

Jan 05 '24 13:01 Object905

Ready to be merged or closed 😐

If you can rebase against the latest code I'll see about reviewing this 👌

Jan 23 '24 21:01 alexander-beedie

@alexander-beedie Sorry for disappearance. Rebased main.

Feb 14 '24 16:02 Object905

Does this prevent monkey patching?

I use pl.Expr.some_func = lambda x: x #some cool thing from time to time and it seems __slots__ would prevent that.

Also, what about this syntax pl.col.a instead of pl.col('a')?

Feb 14 '24 20:02 deanm0000

@deanm0000 pl.Expr.some_func = ... modifies class __dict__, which is still in place. But this would prevent

instance = pl.col("a")
instance.some_func = ...

Also suggesting to take a look at pipe method on expression for this, it also plays well with typing.

pl.col.a works, because its implemented with getattr.

Feb 15 '24 05:02 Object905

@alexander-beedie Sorry for disappearance. Rebased main.

Thx; will look at the weekend

Feb 16 '24 09:02 alexander-beedie

Codecov Report

Attention: Patch coverage is 82.79570% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 80.78%. Comparing base (0e63403) to head (4ed1b8b). Report is 15 commits behind head on main.

:exclamation: Current head 4ed1b8b differs from pull request most recent head 7da83bd. Consider uploading reports for the commit 7da83bd to get more accurate results

Files	Patch %	Lines
py-polars/polars/expr/array.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/binary.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/categorical.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/datetime.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/list.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/meta.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/name.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/string.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/expr/struct.py	66.66%	0 Missing and 1 partial :warning:
py-polars/polars/series/array.py	66.66%	0 Missing and 1 partial :warning:
... and 6 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #13236      +/-   ##
==========================================
- Coverage   80.92%   80.78%   -0.15%     
==========================================
  Files        1328     1326       -2     
  Lines      173446   173123     -323     
  Branches     2455     2455              
==========================================
- Hits       140369   139865     -504     
- Misses      32605    32771     +166     
- Partials      472      487      +15

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 17 '24 08:02 codecov[bot]

@stinodego May I ask what the decision criteria for inclusion finally was? I'm really just curious because I considered something similar in scikit-learn and measurement showed no performance improvement at all, quite similar to https://github.com/pola-rs/polars/pull/13236#issuecomment-1869714451 did neither.

Feb 27 '24 20:02 lorentzenchr

@stinodego May I ask what the decision criteria for inclusion finally was? I'm really just curious because I considered something similar in scikit-learn and measurement showed no performance improvement at all, quite similar to #13236 (comment) did neither.

I like that this clearly documents/enforces what the instance variables are. That is the most important thing for me.

I found out later that this changes the behavior of getattr and I'm not sure I'm happy with that:

class A:
    __slots__ = ("x",)

print(getattr(A, "x", None))
# <member 'x' of 'A' objects>       <-- without slots this would print None

But besides that, I don't see any big downsides so might as well give this a try. There may be some users out there that hit a specific case where the potential performance benefits actually help them.

Feb 27 '24 21:02 stinodego

@stinodego Thanks for the explanation.

Feb 28 '24 07:02 lorentzenchr