conda-build
conda-build copied to clipboard
Use pickle's loads/dumps as a faster deepcopy
Description
Attempting to use pickle
instead of deepcopy
for a faster copy.
Split from:
- https://github.com/conda/conda-build/pull/5225
- https://github.com/conda/conda-build/pull/5248
- https://github.com/conda/conda-build/pull/5253
Checklist - did you ...
- [ ] ~Add a file to the
news
directory (using the template) for the next release's release notes?~ - [ ] ~Add / update necessary tests?~
- [ ] ~Add / update outdated documentation?~
CodSpeed Performance Report
Merging #5281 will improve performances by 66.17%
Comparing kenodegard:pickle-vs-deepcopy
(e5318de) with main
(fbbc298)
Summary
⚡ 1
improvements
✅ 2
untouched benchmarks
Benchmarks breakdown
Benchmark | main |
kenodegard:pickle-vs-deepcopy |
Change | |
---|---|---|---|---|
⚡ | test_pin_subpackage_benchmark |
12.2 s | 7.4 s | +66.17% |
pre-commit.ci autofix
@kenodegard @dholth Are we worried about errors where something cannot be pickled or shall we merge this one?
The test failures appear to be unrelated warnings appearing w/ python 3.12:
DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
I don't follow it totally but it appears to happen in extractions of packages.
Are we worried about errors where something cannot be pickled
Since this is currently only applied to variant
objects and I think variants are necessarily dicts of lists, strings and numbers, I don't think this is likely to come up.
DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
I don't follow it totally but it appears to happen in extractions of packages.
Yep, this is the tarfile
usage in conda-package-streaming
. Added https://github.com/conda/conda-package-streaming/issues/87 to track. For now we can add a filterwarnings setting to pyproject.toml
, I'd say. I think @zklaus was looking into it.
filterwarnings setting to pyproject.toml
do you have an example of this or a link to the docs?
I am not following.
pre-commit.ci autofix
Thanks for this! I've been testing it out, and with this and #5384 rerender time is reduced by ~90%, from over 30 minutes to about 4 on some recipes with a large variant matrix (petsc4py).
FWIW, I spent a little time looking into making config.variant
itself an immutable dict via deepfreeze
since that's already computed for the hash anyway and would eliminate the need to copy variants altogether. But there are a few places where the variant is modified in place that make it a little bit tricky. Still doable and perhaps right thing to do, but not so simple. I think it also comes at an increased cost of computing list_of_dicts_to_dict_of_lists
because there would end up being a lot of converting tuples to lists and back.
I also noticed in https://github.com/conda-forge/conda-smithy/pull/1967 that deepfreeze is quite a lot more expensive than using HashableDict (deepfreeze takes ~10x the time of hash(HashableDict(dict))