[Bug]: `PoseEstimationSeries` creation is significantly increasing sleap-io test runtimes
What happened?
Since the pynwb update from version 2.7.0 to 2.8.0, sleap-io tests have been experiencing significantly longer runtimes, which prevent our GitHub workflows from completing. For example, on Windows (Python 3.12), three of our tests that create new .nwb files take ~23s with pynwb==2.7.0 and ~159s with pynwb==2.8.1, as shown in these two runs: 2.7.0, 2.8.1.
After profiling and debugging runs, we found that the slowdown occurs when we create new PoseEstimationSeries objects using a kwargs dictionary on lines 460-461 of sleap-io/sleap_io/io/nwb.py. I have reproduced the issue and printed profiling results in the following colab notebooks.
Steps to Reproduce
See colab notebooks above
Traceback
No response
Operating System
Windows
Python Executable
Conda
Python Version
3.11
Package Versions
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
- [X] Have you checked the Contributing document?
- [X] Have you ensured this bug was not already reported?
Thanks for the bug report @keyaloding . I am not sure why that is happening. I will run some tests on my end and get back to you next week.
seems primarily related to docval being very inefficient. sorting by cumulative time, it seems like the main difference is that hdmf.utils.func_call (specifically the is_method branch on line 666) is called 704 times in both, but has 768072 recursive calls in 2.7 and 3569184 recursive calls in 2.8.1 (both run with hdmf==3.14.4). all the other profiling info is the same, just scaled by ~5.
that's not all that helpful since docval makes it look like every call is func_call, but if you look for the other things that have their call numbers scaled by that much it might help - eg. manager.register_container_type is also called ~5x more, but it took me awhile puzzing at the comparison to see what i think is going on.
this change: https://github.com/NeurodataWithoutBorders/pynwb/compare/2.7.0..2.8.2#diff-e1599233197e637d0598c751d848a922cba3458d6e206efda4c10c4704fd0432R50-R51
now uses pynwb's get_type_map function, which deepcopy's pynwb's type map, which is substantially larger than hdmf's. the only place i see that get used is here: https://github.com/hdmf-dev/hdmf/blob/874db31fb171577bfc0c962708786f9774ea6b1b/src/hdmf/container.py#L91
which constructs a setter method when docval requests one. The problem is that _get_type_map gets called every time that a property is set, and it makes a very expensive deepcopy call (to hdmf's TypeMap.__deepcopy__).
that __deepcopy__ calls TypeMap.merge, which explains the additional register_container_type et al extra calls.
i've found similar things in pynwb in the past where deepcopy is used to protect against mutation in a context where it isn't needed, eg in those setter calls there's no path to mutate the type map, so it should just return without copying.
Good find, @sneakers-the-rat ! We'll adjust this on the pynwb side.