polars
polars copied to clipboard
perf(python): add __slots__ to pl.Expr, pl.LazyFrame, built in namespaces, various internal structures and some dtypes.
Closes https://github.com/pola-rs/polars/issues/13198
In this PR are "safe" additions of __slots__. Mostly for classes that are not usually subclassed by users - pl.Expr, its subclasses and build in namespaces.
polars.datatypes.classes also really good candidates, since they are instantiated now, but not included in this PR (yet?).
Adding __slots__ for LazyFrame (and DataFrame/Series) turned out to be a breaking change because of subclassing.
Using ldf.__class__ = SubClassedLazyFrame, like in test_preservation_of_subclasses is an error because of different object layouts. And even if subclass were to define __slots__ - they can only be empty, without additional attributes on instance to keep this assigment valid.
That can't be solved without specific conversion mechanism to subclasses to properly create a new instance (like pl.LazyFrame.as_subclass(SubClassedLazyFrame, *args, **kwargs)) where _ldf attribute itself would be passed/assigned to new instance of subclass.
Also using ldf.__class__ = SubClassedLazyFrame a dirty hack anyway and if polars were to properly support subclassing of frames, I think something similar should be in place - then __slots__ on them can be added safely.
Deeper description about __slots__ and benefits\downsides can be found here.
Also in issue were not mentioned memory usage benefits aside from slightly faster attribute access.
(sqlalchemy claimed 46% less memory usage for it's internal structures). Even though in latest python versions this gap should be smaller, due to more compact dict.
Thanks for the PR. Could you provide some data on the performance impact of this change? I think I'm in favor of this change, but it would be nice to have some numbers to support it.
About subclassing: we don't support this. That test is from a while ago and we no longer guarantee preserving the class on calling a method.
It's hard to measure exactly, because of variety of possible use cases. But speedup is small, which is anticipated.
In my production use case constructing pl.Expr and pl.LazyFrame with all the glue takes ~20% of request runtime, while collecting them only about 10% because dataframes are pretty small (100-2000 rows). And given that this change is sadly close to noise.
Anyway here is some synthetic bench.
upstream
In [5]: %timeit -n 5000 pl.when(pl.col("foo") == 1).then("bar").when(pl.col("foo") == 2).then(pl.col("bar") / 2).when(pl.col("bar") == p
...: l.col("foo")).then("baz").otherwise(pl.col("bar").list.eval(pl.element().str.starts_with("ni")))
34.4 µs ± 480 ns per loop (mean ± std. dev. of 7 runs, 5,000 loops each)
In [2]: %timeit -n 10000 -r 100 pl.col("foo").alias("bar")
1.41 µs ± 43.8 ns per loop (mean ± std. dev. of 100 runs, 10,000 loops each)
In [3]: sys.getsizeof(pl.col.foo)
Out[3]: 48
this PR
In [7]: %timeit -n 5000 pl.when(pl.col("foo") == 1).then("bar").when(pl.col("foo") == 2).then(pl.col("bar") / 2).when(pl.col("bar") == p
...: l.col("foo")).then("baz").otherwise(pl.col("bar").list.eval(pl.element().str.starts_with("ni")))
33.5 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 5,000 loops each)
In [3]: %timeit -n 10000 -r 100 pl.col("foo").alias("bar")
1.33 µs ± 41.3 ns per loop (mean ± std. dev. of 100 runs, 10,000 loops each)
In [3]: sys.getsizeof(pl.col.foo)
Out[3]: 40
Also docs are failing to build because of this, because sphinx tries to get __dict__ on namespaces to check for classmethod/staticmethod.
If it's acceptable, I will add workaround for docs and __slots__ to other wrappers (LazyFrame/DataFrame/Series) later, otherwise might as well close this PR.
Ready to be merged or closed :neutral_face:
Ready to be merged or closed 😐
If you can rebase against the latest code I'll see about reviewing this 👌
@alexander-beedie Sorry for disappearance. Rebased main.
Does this prevent monkey patching?
I use pl.Expr.some_func = lambda x: x #some cool thing from time to time and it seems __slots__ would prevent that.
Also, what about this syntax pl.col.a instead of pl.col('a')?
@deanm0000
pl.Expr.some_func = ... modifies class __dict__, which is still in place. But this would prevent
instance = pl.col("a")
instance.some_func = ...
Also suggesting to take a look at pipe method on expression for this, it also plays well with typing.
pl.col.a works, because its implemented with getattr.
@alexander-beedie Sorry for disappearance. Rebased main.
Thx; will look at the weekend
Codecov Report
Attention: Patch coverage is 82.79570% with 16 lines in your changes are missing coverage. Please review.
Project coverage is 80.78%. Comparing base (
0e63403) to head (4ed1b8b). Report is 15 commits behind head on main.
:exclamation: Current head 4ed1b8b differs from pull request most recent head 7da83bd. Consider uploading reports for the commit 7da83bd to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #13236 +/- ##
==========================================
- Coverage 80.92% 80.78% -0.15%
==========================================
Files 1328 1326 -2
Lines 173446 173123 -323
Branches 2455 2455
==========================================
- Hits 140369 139865 -504
- Misses 32605 32771 +166
- Partials 472 487 +15
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@stinodego May I ask what the decision criteria for inclusion finally was? I'm really just curious because I considered something similar in scikit-learn and measurement showed no performance improvement at all, quite similar to https://github.com/pola-rs/polars/pull/13236#issuecomment-1869714451 did neither.
@stinodego May I ask what the decision criteria for inclusion finally was? I'm really just curious because I considered something similar in scikit-learn and measurement showed no performance improvement at all, quite similar to #13236 (comment) did neither.
I like that this clearly documents/enforces what the instance variables are. That is the most important thing for me.
I found out later that this changes the behavior of getattr and I'm not sure I'm happy with that:
class A:
__slots__ = ("x",)
print(getattr(A, "x", None))
# <member 'x' of 'A' objects> <-- without slots this would print None
But besides that, I don't see any big downsides so might as well give this a try. There may be some users out there that hit a specific case where the potential performance benefits actually help them.
@stinodego Thanks for the explanation.