traitlets
traitlets copied to clipboard
Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster)
This makes creating ipywidgets about 2x faster (ignoring comm construction).
Related to, but superseded https://github.com/ipython/traitlets/pull/639
The ipywidget tests fail, since it adds _comm_default
at runtime, https://github.com/jupyter-widgets/ipywidgets/pull/3590 fixes that and is also a better method anyway. The question is, is this a breaking change? I don't think there is any mention of adding _trait_default
at runtime, and it causes a significant runtime cost.
I think it is fair to make this change, assuming https://github.com/jupyter-widgets/ipywidgets/pull/3590 gets merged and released and we go green.
I have another commit that gives a good performance win in a situation where a lot of default are 'static' (like a fixed default value for string and ints). This happens a lot for ipywidgets.
Should I push to this branch? or should I wait to get this in first (I could open a new PR and later rebase against main)
I'd say push here
I've been benchmarking this with pytest-benchmark using this script:
from traitlets import HasTraits, Int, Unicode
class SingleTrait(HasTraits):
foo = Int()
class ManyTrait(HasTraits):
foo0 = Unicode("foo0").tag(sync=True)
foo1 = Unicode("foo1").tag(sync=True)
foo2 = Unicode("foo2").tag(sync=True)
foo3 = Unicode("foo3").tag(sync=True)
foo4 = Unicode("foo4").tag(sync=True)
foo5 = Unicode("foo5").tag(sync=True)
foo6 = Unicode("foo6").tag(sync=True)
foo7 = Unicode("foo7").tag(sync=True)
foo8 = Unicode("foo8").tag(sync=True)
foo9 = Unicode("foo9").tag(sync=True)
def create(N, cls):
for i in range(N):
cls()
def create_and_access(N, cls):
keys = cls.class_trait_names(sync=True)
for i in range(N):
obj = cls()
for key in keys:
getattr(obj, key)
def test_single_trait(benchmark):
benchmark(create, cls=SingleTrait, N=1_000)
def test_many_trait(benchmark):
benchmark(create, cls=ManyTrait, N=1_000)
def test_single_trait_and_access(benchmark):
benchmark(create_and_access, cls=SingleTrait, N=1_000)
def test_many_trait_and_access(benchmark):
benchmark(create_and_access, cls=ManyTrait, N=1_000)
Results
Main a35dd0b686d9cbdfc65ce14e4889ea39980d156d
-------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 12.4555 (1.0) 12.9838 (1.0) 12.5927 (1.0) 0.1114 (1.62) 12.5714 (1.0) 0.1253 (2.96) 14;2 79.4108 (1.0) 71 1
test_single_trait_and_access 13.0052 (1.04) 13.5203 (1.04) 13.1818 (1.05) 0.1235 (1.80) 13.1553 (1.05) 0.1789 (4.23) 21;0 75.8624 (0.96) 75 1
test_many_trait 14.0976 (1.13) 14.4672 (1.11) 14.3162 (1.14) 0.0686 (1.0) 14.3159 (1.14) 0.0423 (1.0) 19;16 69.8510 (0.88) 69 1
test_many_trait_and_access 57.5647 (4.62) 59.3905 (4.57) 58.3111 (4.63) 0.5214 (7.60) 58.1785 (4.63) 0.8110 (19.18) 5;0 17.1494 (0.22) 18 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
before the last commit 166cef4ffca4896528b5079839dfb64ef67d4b1b
--------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 2.2666 (1.0) 2.5554 (1.0) 2.3552 (1.0) 0.0274 (1.63) 2.3493 (1.0) 0.0152 (1.06) 50;47 424.5879 (1.0) 400 1
test_single_trait_and_access 2.4376 (1.08) 2.6091 (1.02) 2.5242 (1.07) 0.0169 (1.0) 2.5228 (1.07) 0.0151 (1.06) 54;24 396.1647 (0.93) 394 1
test_many_trait 2.6998 (1.19) 2.9632 (1.16) 2.8028 (1.19) 0.0299 (1.77) 2.7969 (1.19) 0.0143 (1.0) 47;50 356.7877 (0.84) 367 1
test_many_trait_and_access 27.8987 (12.31) 29.0280 (11.36) 28.0583 (11.91) 0.2163 (12.83) 27.9914 (11.91) 0.0954 (6.67) 4;4 35.6400 (0.08) 36 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ae8c6cde983a689602f747932f55fa3780a4a1fe
------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 2.4483 (1.0) 2.7051 (1.0) 2.5550 (1.0) 0.0328 (1.13) 2.5482 (1.0) 0.0225 (1.44) 59;49 391.3851 (1.0) 370 1
test_single_trait_and_access 2.6786 (1.09) 2.9397 (1.09) 2.7569 (1.08) 0.0308 (1.06) 2.7498 (1.08) 0.0218 (1.39) 52;41 362.7242 (0.93) 362 1
test_many_trait 3.3650 (1.37) 3.7456 (1.38) 3.4842 (1.36) 0.0291 (1.0) 3.4812 (1.37) 0.0156 (1.0) 22;22 287.0108 (0.73) 278 1
test_many_trait_and_access 5.3269 (2.18) 5.9227 (2.19) 5.5025 (2.15) 0.0671 (2.31) 5.4879 (2.15) 0.0230 (1.47) 13;19 181.7341 (0.46) 170 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Looking at the median, we see a >10x improvement for HasTrait
with many 'static'+immutable data, which is quite common in ipywidgets (see e.g. the Layout widget https://github.com/jupyter-widgets/ipywidgets/blob/master/python/ipywidgets/ipywidgets/widgets/widget_layout.py )
where the last commit really helps with a factor of ~5x
I noticed that ae8c6cde983a689602f747932f55fa3780a4a1fe slowed down the other cases because we now did some unneeded checking of presence in a dict, so I fixed that.
Now ae8c6cde983a689602f747932f55fa3780a4a1fe only gives a speedup compared to 166cef4ffca4896528b5079839dfb64ef67d4b1b:
------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 2.2957 (1.0) 2.5338 (1.0) 2.3469 (1.0) 0.0437 (1.0) 2.3307 (1.0) 0.0493 (1.0) 83;20 426.0973 (1.0) 403 1
test_single_trait_and_access 2.3051 (1.00) 6.7410 (2.66) 2.4294 (1.04) 0.2315 (5.30) 2.3947 (1.03) 0.1165 (2.36) 7;7 411.6245 (0.97) 425 1
test_many_trait 2.7801 (1.21) 3.3918 (1.34) 2.8540 (1.22) 0.0860 (1.97) 2.8196 (1.21) 0.0813 (1.65) 40;23 350.3821 (0.82) 348 1
test_many_trait_and_access 4.8385 (2.11) 5.3911 (2.13) 5.0037 (2.13) 0.1150 (2.63) 4.9796 (2.14) 0.1399 (2.84) 53;6 199.8502 (0.47) 187 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nice work!
Found another performance issue: We are calling Descriptor.instance_init
for each HasTraits we create, for each descriptor, while actually many of them don't do anything.
I've refactored it such that only at class construction we collect all instance_init
methods that do something. We by default assume that we "do something", by adding instance_init
to a list of functions to be called.
However, many of the TypeTraits opt out of this by overriding subclass_init
to not add it's own instance_init
to the list of functions.
This makes it almost perfectly backwards compatible, except...
When someone inherits from say Int
, its instance_init
will not be called. We could make it fully compatible by checking if type(self) == Int
(and similar for the other traits). Let me know, I'm happy to add this in.
When creating a 1000 HasTraits with 30 traits that do not need an instance_init
, we avoid 30000 unneeded function calls.
Performance benchmark:
Before
------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 2.2703 (1.0) 2.5766 (1.0) 2.3246 (1.0) 0.0510 (1.50) 2.3051 (1.0) 0.0656 (2.79) 65;13 430.1731 (1.0) 407 1
test_single_trait_and_access 2.2841 (1.01) 2.7480 (1.07) 2.3585 (1.01) 0.0789 (2.31) 2.3257 (1.01) 0.0752 (3.20) 62;36 423.9969 (0.99) 430 1
test_many_trait 2.8105 (1.24) 3.0434 (1.18) 2.8452 (1.22) 0.0341 (1.0) 2.8343 (1.23) 0.0235 (1.0) 36;30 351.4728 (0.82) 287 1
test_many_trait_and_access 4.6244 (2.04) 4.9061 (1.90) 4.6736 (2.01) 0.0457 (1.34) 4.6592 (2.02) 0.0415 (1.77) 24;15 213.9672 (0.50) 213 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
After
------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 2.2464 (1.0) 2.4565 (1.01) 2.2825 (1.00) 0.0252 (1.0) 2.2769 (1.00) 0.0230 (1.32) 58;22 438.1106 (1.00) 422 1
test_many_trait 2.2525 (1.00) 2.4332 (1.00) 2.2818 (1.0) 0.0257 (1.02) 2.2746 (1.0) 0.0223 (1.28) 55;32 438.2443 (1.0) 436 1
test_single_trait_and_access 2.2594 (1.01) 2.4330 (1.0) 2.2858 (1.00) 0.0254 (1.01) 2.2781 (1.00) 0.0174 (1.0) 42;36 437.4761 (1.00) 427 1
test_many_trait_and_access 4.1723 (1.86) 4.4918 (1.85) 4.2298 (1.85) 0.0608 (2.42) 4.2072 (1.85) 0.0509 (2.92) 33;26 236.4174 (0.54) 230 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This saves another ~10% for HasTraits with 10 traits. I've also tested this in a more extreme case with 100 traits, which gives a 25% performance increase. This shows this approach scales much better.
Looking at creating a 100 trait object in viztracer, we see:
before
data:image/s3,"s3://crabby-images/9a325/9a32553b7d60e77590f22ae627c1c86cc71de57c" alt="image"
setup_instance
takes most of the time.
after
data:image/s3,"s3://crabby-images/4f188/4f1886905b5721f39134283dcee443f984b2b1f6" alt="image"
So instead of setup_instance
taking most of the time creating an object, we only spend time doing a dict copy with the trait values (the only overhead that we cannot avoid). Now the last bottle neck seems to be the HasTraits constructor (more specifically the context manager)
After also working on the HasTrait constructor, where we remove a lot of overhead from the hold_trait_notifications
context manager.
------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait 1.3711 (1.0) 1.5112 (1.0) 1.3890 (1.0) 0.0176 (1.03) 1.3829 (1.0) 0.0121 (1.0) 77;69 719.9187 (1.0) 682 1
test_single_trait_and_access 1.3869 (1.01) 1.5423 (1.02) 1.4141 (1.02) 0.0171 (1.0) 1.4105 (1.02) 0.0171 (1.42) 144;30 707.1417 (0.98) 699 1
test_many_trait 1.4594 (1.06) 1.6393 (1.08) 1.4794 (1.07) 0.0194 (1.13) 1.4736 (1.07) 0.0154 (1.27) 79;50 675.9557 (0.94) 662 1
test_many_trait_and_access 3.2931 (2.40) 3.5277 (2.33) 3.3287 (2.40) 0.0398 (2.32) 3.3156 (2.40) 0.0267 (2.21) 36;35 300.4134 (0.42) 293 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
We see another ~25% performance increase.
I think I don't see any overhead at this moment anymore at constructing HasTraits that are obvious and don't break things, I think this is as fast as I can push it.
Possible future optimization might be in removing magic methods, because that would save us in checking with hasattr at runtime. Also, the Bunch
construction gives some overhead, but I don't see how we can get rid of that.
Ha, I love seeing the PR title update. Thanks again for digging in to this!
Per Maarten's ideas, I further dug into performance here. Some quick observations:
Overhead from the Bunch class
First, it's worth noting that the Bunch
class adds some serious overhead to the dict class, if you use the bunch.attribute
syntax to access values. With this simple test case:
def test_bunch_named_attribute_access(benchmark):
def do_named_access():
b = Bunch(x=5, y=10)
for _ in range(1_000_000):
new = b.x
benchmark(do_named_access)
def test_bunch_dict_access(benchmark):
def do_key_access():
b = Bunch(x=5, y=10)
for _ in range(1_000_000):
new = b['x']
benchmark(do_key_access)
This has the following results:
------------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bunch_dict_access 31.1114 (1.0) 31.3130 (1.0) 31.2269 (1.0) 0.0496 (1.0) 31.2313 (1.0) 0.0681 (1.0) 9;0 32.0236 (1.0) 32 1
test_bunch_named_attribute_access 233.6903 (7.51) 234.9689 (7.50) 234.2604 (7.50) 0.5379 (10.84) 234.3409 (7.50) 0.9007 (13.23) 2;0 4.2688 (0.13) 5 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
So, using the bunch.x
is about 7.5x slower than using bunch['x']
.
Overhead comes from the get_attr function in bunch
This overhead is clearly from the get_attr function, which is overwritten to call the .getitem function on the underlying dict.
You can get some minor performance improvements by setting __getitem__ = dict.get
, (you go to 5x slower), but even then there's a lot of performance left on the table with bunch using the named attribute access.
The other problem here is that user code that has event observers likely expects the AttributeErrors to be turned to KeyErrors, as currently happens in the getitem function. This performance PR shouldn't change the external interface users mess about with -- but explored below are ways to help minimize the impact of bunch performance on tartlets internally.
Using the bunch.attr
syntax adds significant overhead to event processing
The Bunch
is primarily used internally within traitlets as the return value for the observe functions when watching change events.
Consider the following example where we're watching many traits:
from typing import Any, Dict
from traitlets import HasTraits, Integer, observe, Unicode, All
class ExampleHasTraits(HasTraits):
foo: int = Integer(2, help="a number").tag(config=True) # type:ignore
@observe('num')
def _num_changed(self, change):
print("{name} changed from {old} to {new}".format(**change))
class ManyTrait(HasTraits):
foo0 = Unicode("foo0").tag(sync=True)
foo1 = Unicode("foo1").tag(sync=True)
foo2 = Unicode("foo2").tag(sync=True)
foo3 = Unicode("foo3").tag(sync=True)
foo4 = Unicode("foo4").tag(sync=True)
foo5 = Unicode("foo5").tag(sync=True)
foo6 = Unicode("foo6").tag(sync=True)
foo7 = Unicode("foo7").tag(sync=True)
foo8 = Unicode("foo8").tag(sync=True)
foo9 = Unicode("foo9").tag(sync=True)
foo10 = Unicode("foo10").tag(sync=True)
foo11 = Unicode("foo11").tag(sync=True)
foo12 = Unicode("foo12").tag(sync=True)
foo13 = Unicode("foo13").tag(sync=True)
foo14 = Unicode("foo14").tag(sync=True)
foo15 = Unicode("foo15").tag(sync=True)
foo16 = Unicode("foo16").tag(sync=True)
foo17 = Unicode("foo17").tag(sync=True)
foo18 = Unicode("foo18").tag(sync=True)
foo19 = Unicode("foo19").tag(sync=True)
foo20 = Unicode("foo20").tag(sync=True)
@observe(All)
def _num_changed(self, change):
var = change['old']
def create_and_set(N, cls):
keys = cls.class_trait_names(sync=True)
for i in range(N):
obj = cls()
for key in keys:
setattr(obj, key, "123")
import cProfile
# First, we run with 1k
with cProfile.Profile() as pr:
create_and_set(N=10_000, cls=ManyTrait)
import cProfile, pstats
from pstats import SortKey
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr).sort_stats(sortby)
ps.print_stats()
If we simply remove any usage of named access on the traitlets within the library, we get the following performance change:
With named access:
6630113 function calls in 1.272 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
210000 0.452 0.000 0.750 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
210000 0.098 0.000 1.157 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
630000 0.083 0.000 0.109 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/utils/bunch.py:13(__getattr__)
210000 0.079 0.000 0.199 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
210000 0.075 0.000 0.859 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
630000 0.070 0.000 0.070 0.000 {built-in method builtins.hasattr}
210000 0.049 0.000 0.076 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
850000 0.037 0.000 0.037 0.000 {built-in method builtins.isinstance}
210000 0.037 0.000 1.193 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
840021 0.036 0.000 0.036 0.000 {method 'get' of 'dict' objects}
210000 0.034 0.000 0.784 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
210000 0.032 0.000 0.032 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
210000 0.029 0.000 0.060 0.000 {built-in method builtins.getattr}
210000 0.027 0.000 1.221 0.000 {built-in method builtins.setattr}
630000 0.026 0.000 0.026 0.000 {method '__getitem__' of 'dict' objects}
210000 0.021 0.000 0.029 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
420000 0.020 0.000 0.020 0.000 {method 'extend' of 'list' objects}
1 0.020 0.020 1.272 1.272 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
210000 0.017 0.000 0.017 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
10000 0.006 0.000 0.023 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
10000 0.005 0.000 0.029 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
10000 0.003 0.000 0.009 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
10000 0.003 0.000 0.015 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
10000 0.003 0.000 0.004 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
10000 0.003 0.000 0.012 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
10000 0.003 0.000 0.003 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
10000 0.002 0.000 0.003 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
10001 0.001 0.000 0.001 0.000 {method 'copy' of 'dict' objects}
10000 0.001 0.000 0.001 0.000 {built-in method __new__ of type object at 0x103b37288}
10000 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
1 0.000 0.000 0.000 0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
21 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
1 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
21 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
22 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
21 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
With no named access:
5370113 function calls in 0.976 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
210000 0.294 0.000 0.475 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
210000 0.094 0.000 0.866 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
210000 0.076 0.000 0.194 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
210000 0.071 0.000 0.578 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
630000 0.068 0.000 0.068 0.000 {built-in method builtins.hasattr}
210000 0.049 0.000 0.076 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
850000 0.035 0.000 0.035 0.000 {built-in method builtins.isinstance}
210000 0.034 0.000 0.900 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
840021 0.034 0.000 0.034 0.000 {method 'get' of 'dict' objects}
210000 0.031 0.000 0.507 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
210000 0.030 0.000 0.030 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
210000 0.029 0.000 0.059 0.000 {built-in method builtins.getattr}
210000 0.026 0.000 0.926 0.000 {built-in method builtins.setattr}
210000 0.021 0.000 0.027 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
420000 0.019 0.000 0.019 0.000 {method 'extend' of 'list' objects}
1 0.019 0.019 0.976 0.976 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
210000 0.016 0.000 0.016 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
10000 0.006 0.000 0.022 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
10000 0.005 0.000 0.028 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
10000 0.003 0.000 0.009 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
10000 0.003 0.000 0.003 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
10000 0.003 0.000 0.015 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
10000 0.003 0.000 0.012 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
10000 0.002 0.000 0.002 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
10000 0.002 0.000 0.003 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
10001 0.001 0.000 0.001 0.000 {method 'copy' of 'dict' objects}
10000 0.001 0.000 0.001 0.000 {built-in method __new__ of type object at 0x105a1f288}
10000 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
1 0.000 0.000 0.000 0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
21 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
1 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
21 0.000 0.000 0.000 0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
21 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
22 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
If you actually line profile the benchmark observers function, you see that about 25% of the time in this function is just doing bunch accesses:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1509 @profile
1510 def _notify_observers(self, event):
1511 """Notify observers of any event"""
1512 210000 67280.0 0.3 3.8 if not isinstance(event, Bunch):
1513 # cast to bunch if given a dict
1514 event = Bunch(event)
1515 210000 256428.0 1.2 14.3 name, type = event.name, event.type
1516
1517 210000 62783.0 0.3 3.5 callables = []
1518 210000 72407.0 0.3 4.0 if name in self._trait_notifiers:
1519 callables.extend(self._trait_notifiers.get(name, {}).get(type, []))
1520 callables.extend(self._trait_notifiers.get(name, {}).get(All, []))
1521 210000 71647.0 0.3 4.0 if All in self._trait_notifiers: # type:ignore[comparison-overlap]
1522 420000 133094.0 0.3 7.4 callables.extend(
1523 210000 92931.0 0.4 5.2 self._trait_notifiers.get(All, {}).get(type, []) # type:ignore[call-overload]
1524 )
1525 420000 130307.0 0.3 7.3 callables.extend(
1526 210000 91059.0 0.4 5.1 self._trait_notifiers.get(All, {}).get(All, []) # type:ignore[call-overload]
1527 )
1528
1529 # Now static ones
1530 210000 80596.0 0.4 4.5 magic_name = "_%s_changed" % name
1531 210000 185720.0 0.9 10.4 if event.type == "change" and hasattr(self, magic_name):
1532 class_value = getattr(self.__class__, magic_name)
1533 if not isinstance(class_value, ObserveHandler):
1534 _deprecated_method(
1535 class_value,
1536 self.__class__,
1537 magic_name,
1538 "use @observe and @unobserve instead.",
1539 )
1540 cb = getattr(self, magic_name)
1541 # Only append the magic method if it was not manually registered
1542 if cb not in callables:
1543 callables.append(_callback_wrapper(cb))
1544
1545 # Call them all now
1546 # Traits catches and logs errors here. I allow them to raise
1547 420000 131027.0 0.3 7.3 for c in callables:
1548 # Bound methods have an additional 'self' argument.
1549
1550 210000 76219.0 0.4 4.3 if isinstance(c, _CallbackWrapper):
1551 c = c.__call__
1552 210000 78564.0 0.4 4.4 elif isinstance(c, EventHandler) and c.name is not None:
1553 210000 152623.0 0.7 8.5 c = getattr(self, c.name)
1554
1555 210000 107464.0 0.5 6.0 c(event)
If we update to not used named access, line 1515 goes from 14% of runtime to 5% of runtime, and 1531 goes from 10% of runtime to 6.6% of runtime, dropping the runtime of this function by 15% with no functionality changes anywhere to the lib.
Also note that per line profiling also tells us that bunch creation is a pretty minimal process overall, with <3% of runtime being spent here. Optimizing these bunch creations doesn't help anywhere near as much as just using them better.
I'll open a PR today against this PR that makes these changes (about 6 characters being changed total), as for traitlets that have their values set frequently, this gives a 20% performance benefit for effectively no change.
Based on the above logs, there aren't any more super obvious optimization targets when it comes to set events + change observers. Are there any other common workflows with traitlets that would be worth profiling to see if there are easy wins?
Not sure if this was already used, but it might be worth running perflint to see if there are any other micro-improvements that can be made (balancing for readability of course).
Good news: after the ipywidgets 8.0.3 release, those tests pass with this branch. Bad news: apparent CI rot, I think we're picking up the latest mypy which is more strict. I think that the last thing needs to be fixed in main.
#809 fixes the typing
#809 is merged, but there's now a conflict.
@blink1073 do you understand the jupyter_server issue?
Server 2.0.1 has a fix for that, I kicked the build to see if we pick it up.
Nope, it must be taking a while to propagate.
🥳 @naterush do you want to open your PR? To follow up
We are using sphinx 4.5.0 with myst-nb 0.17.1 and are now getting AttributeError: __provides__
at traitlets.py:1009 after upgrading from traitlets 5.6 to 5.7. It looks like it may be related to this PR based on this comment?
Thanks for letting us know, how can I trigger this error? So I can test it locally
Thanks for letting us know, how can I trigger this error? So I can test it locally