Speeds up the Path and derived Parameters
Closes #889
- Avoiding to use Parameterized
resolve_pathfunction as there is big, big overhead compared to using simple function! - Caching the actual resolve to not do this over and over and over again in Panel apps.
If I run pytest tests/testpathparam.py, I see.
main: 57 passed in 0.21s
this: 57 passed in 0.19s
So this is not where you see the benefit. That is because you are only testing new instances and new paths. But that is not how Path is being used in practice. For example in Panel the pn.config is rather static and holds paths to css etc. which is being requested every time a page is loaded.
Performance tests
My setup
Python 3.10 on Linux.
Profiling script
You can use the script below.
import pytest
from pyinstrument import Profiler
import param
from pathlib import Path
N = 100000
TESTS_ROOT = Path.cwd()
@pytest.fixture(autouse=True)
def auto_profile(request):
PROFILE_ROOT = (TESTS_ROOT / ".profiles")
# Turn profiling on
profiler = Profiler()
profiler.start()
yield # Run test
profiler.stop()
PROFILE_ROOT.mkdir(exist_ok=True)
results_file = PROFILE_ROOT / f"{request.node.name}.html"
profiler.write_html(results_file)
class Custom(param.Parameterized):
path = param.Path("script.py")
def test_class():
for i in range(N):
Custom.path
custom = Custom()
def test_existing_instance():
for i in range(N):
custom.path
def test_new_instance():
for i in range(N):
Custom().path
try:
from param.parameters import _resolve_path
def test_no_instance():
for i in range(N):
_resolve_path("script.py")
except:
pass
Results
N=100000
| branch | test_class | test_existing_instance | test_new_instance | test_no_instance |
|---|---|---|---|---|
| main | 11.5s | 11.5s | 15.8s | |
| this - pathlib | 0.7s | 0.7s | 4.9s | 0.3s |
| this - os | 0.6s | 0.7s | 4.9s | 0.3s |
It makes you think about the cost of using Parameterized classes and ParameterizedFunctions! You can compare this to FastApi and Pydantic. Pydantic 2.0 was a reimplementation in Rust to make applications like FastApi fast.
Additional Speed up if using faster attribute lookup.
If the implementation of Path would not use costly self._resolve_path and self.search_paths attribute look ups then it would again be much faster.
The below was run on this branch and shows this.
The implementation would look like
Its a matter of whether you want an 8% speed up over code readability for humans and static analysis tools in your editor.
Additional speed up by not looking up cwd.
Path._resolve look up the cwd on each run because the default value of search_paths is None.
At least in Panel we should probably provide the search_paths once on instantiation of pn.config to avoid having this cost over and over again.
Maybe we should even change the default behaviour of Path.search_paths from None to the cwd when instantiated?
There are broken tests so I assume this is still work in progress?
A test suite is not a benchmark so I'm not surprised we don't see performance penalties or regressions from the test suite itself. We now leverage asv to run some benchmark, feel free to contribute to it https://github.com/holoviz/param/tree/main/benchmarks.
I hoped to get some feedback first. For example its seems Param does not utilize caching today and the focus is minimum of code. This is a change in favor of speed.
Thus the first step is really to spark the discussion with this PR. And I would like to know if you would like to accept this kind of change before I spend time finalizing.
The tests are breaking on macOS and Windows. To me it looks like the path strings are resolved in different But equivalent ways by os and pathlib.
Having thought about it, I assumed pathlib was better as its "modern" Python. But now I'm not sure. At least I should profile that too.
Ok. Using pathlib is slow. The image below slows the new implementation without caching on left and old implementation without caching on right.
I will revert to using os methods and keep the cache.
I need to spend more time to review this (not sure I'll have time this week!). My feedback from working on Param 2.0:
- I'm not surprised there are performance improvements to gain, performance hasn't been a major focus, even if with the addition of the asv benchmark we've been able to catch some regressions and mitigate them.
- don't trust the test suite! It has many gaps, I can't count how many times I broke Panel or HoloViews without breaking the test suite.
- Param is so old that many of its bugs have turned into features, changing it requires great care. It's often worth adding tests before making any change, to capture the previous behavior and see how it's affected.
This piece of code breaks with the suggested changes, the last line used to raise an error and it no longer does. It shows that the difficulty with resources like file/directory paths is that they can be removed without Param knowing about it, which is why I think it always resolves. Maybe the way forward is to overload the newly added check_exists attribute (https://github.com/holoviz/param/pull/800) with a new 'once' mode (better name welcome) that only checks whether the resource exists when the Parameter is instantiated.
import os
import param
import pathlib
fp = pathlib.Path('exists.txt')
fp.write_text('')
class P(param.Parameterized):
path = param.Path('exists.txt')
p = P()
print(p.path)
fp.unlink()
print(p.path)
Thanks for the feedback. I plan to give this a second iteration when I have gotten panel-chemistry working for Panel 1.x. First thing would be adding the use case you describe to the tests.
I quickly played around with another implementation (compared to main, not to this branch), (ab)using check_exists.
diff --git a/param/parameters.py b/param/parameters.py
index ed11b53b..8852b79b 100644
--- a/param/parameters.py
+++ b/param/parameters.py
@@ -2706,9 +2706,11 @@ class Path(Parameter):
----------
search_paths : list, default=[os.getcwd()]
List of paths to search the path from
- check_exists: boolean, default=True
- If True (default) the path must exist on instantiation and set,
- otherwise the path can optionally exist.
+ check_exists: boolean or 'once', default=True
+ If True (default) the path must exist on instantiation and whenever
+ get/set is called, if 'once' the path must only exist on instantiation
+ and on get the first resolved path is returned, if False the path can
+ optionally exist.
"""
__slots__ = ['search_paths', 'check_exists']
@@ -2733,8 +2735,8 @@ class Path(Parameter):
search_paths = []
self.search_paths = search_paths
- if check_exists is not Undefined and not isinstance(check_exists, bool):
- raise ValueError("'check_exists' attribute value must be a boolean")
+ if check_exists is not Undefined and check_exists not in (True, False, 'once'):
+ raise ValueError("'check_exists' attribute value must be a boolean or 'once'")
self.check_exists = check_exists
super().__init__(default,**params)
self._validate(self.default)
@@ -2749,16 +2751,23 @@ class Path(Parameter):
else:
if not isinstance(val, (str, pathlib.Path)):
raise ValueError(f'{_validate_error_prefix(self)} only take str or pathlib.Path types')
+ path = None
try:
- self._resolve(val)
:+ path = self._resolve(val)
except OSError as e:
- if self.check_exists:
+ if self.check_exists is True:
raise OSError(e.args[0]) from None
+ if path is not None and self.check_exists == 'once':
+ # Abusing check_exists to store the resolved path on parameter
+ # instantiation.
+ self.check_exists = ('once', path)
def __get__(self, obj, objtype):
"""
Return an absolute, normalized path (see resolve_path).
"""
+ if isinstance(self.check_exists, tuple):
+ return self.check_exists[1]
raw_path = super().__get__(obj,objtype)
if raw_path is None:
path = None
With these benchmarks:
class ParameterPathResolveSuite:
def setup(self):
import pathlib
p = pathlib.Path('dummy')
p.write_text('dummy')
class P(param.Parameterized):
x0 = param.Path(str(p))
self.P = P
self.p = P()
def time_resolve_new(self):
p = self.P()
p.x0
def time_resolve_same(self):
self.p.x0
class ParameterPathResolveOnceSuite:
def setup(self):
import pathlib
p = pathlib.Path('dummy')
p.write_text('dummy')
class P(param.Parameterized):
x0 = param.Path(str(p), check_exists='once')
self.P = P
self.p = P()
def time_resolve_new(self):
p = self.P()
p.x0
def time_resolve_same(self):
self.p.x0
Then I started to think about how this might be related to readonly and constant, to Parameter attribute inheritance and to __set__. To finally decide this could wait for January 🙃