conda-build icon indicating copy to clipboard operation
conda-build copied to clipboard

Use pickle's loads/dumps as a faster deepcopy

Open kenodegard opened this issue 10 months ago • 1 comments

Description

Attempting to use pickle instead of deepcopy for a faster copy.

Split from:

  • https://github.com/conda/conda-build/pull/5225
  • https://github.com/conda/conda-build/pull/5248
  • https://github.com/conda/conda-build/pull/5253

Checklist - did you ...

  • [ ] ~Add a file to the news directory (using the template) for the next release's release notes?~
  • [ ] ~Add / update necessary tests?~
  • [ ] ~Add / update outdated documentation?~

kenodegard avatar Apr 12 '24 17:04 kenodegard

CodSpeed Performance Report

Merging #5281 will improve performances by 66.17%

Comparing kenodegard:pickle-vs-deepcopy (e5318de) with main (fbbc298)

Summary

⚡ 1 improvements ✅ 2 untouched benchmarks

Benchmarks breakdown

Benchmark main kenodegard:pickle-vs-deepcopy Change
test_pin_subpackage_benchmark 12.2 s 7.4 s +66.17%

codspeed-hq[bot] avatar Apr 12 '24 17:04 codspeed-hq[bot]

pre-commit.ci autofix

beckermr avatar Jun 21 '24 09:06 beckermr

@kenodegard @dholth Are we worried about errors where something cannot be pickled or shall we merge this one?

The test failures appear to be unrelated warnings appearing w/ python 3.12:

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

beckermr avatar Jun 21 '24 10:06 beckermr

Are we worried about errors where something cannot be pickled

Since this is currently only applied to variant objects and I think variants are necessarily dicts of lists, strings and numbers, I don't think this is likely to come up.

minrk avatar Jun 21 '24 21:06 minrk

DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.

I don't follow it totally but it appears to happen in extractions of packages.

Yep, this is the tarfile usage in conda-package-streaming. Added https://github.com/conda/conda-package-streaming/issues/87 to track. For now we can add a filterwarnings setting to pyproject.toml, I'd say. I think @zklaus was looking into it.

jaimergp avatar Jun 24 '24 15:06 jaimergp

filterwarnings setting to pyproject.toml

do you have an example of this or a link to the docs?

I am not following.

beckermr avatar Jun 24 '24 15:06 beckermr

pre-commit.ci autofix

beckermr avatar Jun 24 '24 16:06 beckermr

Thanks for this! I've been testing it out, and with this and #5384 rerender time is reduced by ~90%, from over 30 minutes to about 4 on some recipes with a large variant matrix (petsc4py).

FWIW, I spent a little time looking into making config.variant itself an immutable dict via deepfreeze since that's already computed for the hash anyway and would eliminate the need to copy variants altogether. But there are a few places where the variant is modified in place that make it a little bit tricky. Still doable and perhaps right thing to do, but not so simple. I think it also comes at an increased cost of computing list_of_dicts_to_dict_of_lists because there would end up being a lot of converting tuples to lists and back.

I also noticed in https://github.com/conda-forge/conda-smithy/pull/1967 that deepfreeze is quite a lot more expensive than using HashableDict (deepfreeze takes ~10x the time of hash(HashableDict(dict))

minrk avatar Jun 28 '24 09:06 minrk