pip icon indicating copy to clipboard operation
pip copied to clipboard

PERF: get_requirement() raise cache from 2048 to 8192 elements

Open morotti opened this issue 1 year ago • 1 comments

Hello,

To summarize, the cache was too small :D Should we have a NEWS item for this? It's very internal with no relevance to end users. EDIT: I added the news item since the CI was red 2% faster pip install on dry run, less in actual run.

You can debug the cache with print(f"parsing {req_string} {get_requirement.cache_info()}")

pip install --dry-run packageswith300dependencies
run with a default size
parsing python-jose[cryptography] CacheInfo(hits=51941, misses=13580, maxsize=2048, currsize=2048)

run with a larger size. 
parsing cryptography>=3.4.0; extra == "crypto" CacheInfo(hits=58758, misses=6947, maxsize=10000, currsize=6692)

before: Screenshot 2024-07-23 114336

after: Screenshot 2024-07-23 114343

Regards.

morotti avatar Jul 23 '24 11:07 morotti

FWIW, this, I think, was clear when it was changed to 2048: https://github.com/pypa/pip/pull/12663#issuecomment-2087723027

Originally when packaging 22.0+ was going to be vendored the plan was to drop the cache altogether. But as I showed in that PR, there are real world highly esoteric backtracking scenarios where it makes a noticable difference.

I don't have an opinion on if this is a good change or not, but I do think it would be worth running a memory profiler, like memray to see it changes the peak memory usage (though due to https://github.com/pypa/pip/issues/12834 it will be a small fraction of the total).

notatallshaw avatar Jul 23 '24 13:07 notatallshaw

ping on this. any interest in getting this merged? I'd like to get the few percent of performance :D

morotti avatar Mar 20 '25 10:03 morotti

Do you have a reproducible example that shows this performance improvement?

notatallshaw avatar Mar 20 '25 12:03 notatallshaw

I'm testing on internal projects with hundreds of dependencies. I don't have a venv I can share unfortunately. You can try a pip install with as many dependenices as you can think of. It should hit the limit easily. I'd suggest jupyter as a starting point.

To debug the cache status, you can adjust this function contructors.py, then run pip install.

    def _parse_req_string(req_as_string: str) -> Requirement:
        try:
            r = get_requirement(req_as_string)
            print(get_requirement.cache_info())
            return r

On a common project we use internally (263 dependencies), the installation concludes with CacheInfo(hits=9129, misses=5942, maxsize=8192, currsize=5756) with maxsize at 8192. CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048) with maxsize at 2048. that is a lot of cache missed.

the profiler shows 5% taken in the get_requirement, down to 2% when raising the cache size. image

image

Testing on a larger project I can find (435 dependencies). We hit 7500. I think I need to raise the cache a bit more to 10000 :D CacheInfo(hits=43614, misses=7809, maxsize=8192, currsize=7532)

morotti avatar Mar 20 '25 13:03 morotti

alright, run on the largest envs I could find, 697 dependencies. CacheInfo(hits=126981, misses=9902, maxsize=99999, currsize=9539)

morotti avatar Mar 20 '25 13:03 morotti

I've resent the PR. adjusted to 10000 max cache to fit large venvs. updated the news items. updated to main branch from today.

morotti avatar Mar 20 '25 13:03 morotti

The question of how this affects memory usage has still not been addressed.

Also, how does this affect performance/memory use of small installs? For instance, does the large cache slow down installation of a single requirement (something like pip install --no-deps requests)? Smaller installs are way more common than larger ones, and I'd prefer it if we didn't penalise the common case for the sake of the uncommon one. In practice, I'd be surprised if it makes a difference, but then again, I don't see any overall runtime figures for big runs either.

I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are.

pfmoore avatar Mar 20 '25 13:03 pfmoore

hello @pfmoore , this has no effect on memory usage.

measuring memory usage on a few runs, the memory vary around 80-90MB. irrelevant of the cache size. the cache size makes no meaningful difference. no difference when running a single install. measured with print(psutil.Process().memory_info())

    # CacheInfo(hits=9129, misses=5942, maxsize=10000, currsize=5756)
    # pmem(rss=83542016, vms=112848896, shared=10485760, text=2306048, lib=0, data=86695936, dirty=0)

    # CacheInfo(hits=9129, misses=5942, maxsize=10000, currsize=5756)
    # pmem(rss=90177536, vms=113909760, shared=10485760, text=2306048, lib=0, data=87756800, dirty=0)

    # CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048)
    # pmem(rss=86573056, vms=107511808, shared=11534336, text=2306048, lib=0, data=81358848, dirty=0)

    # CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048)
    # pmem(rss=85520384, vms=107515904, shared=12582912, text=2306048, lib=0, data=81362944, dirty=0)

I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are.

the bottleneck is on parsing the requirement strings. it's pretty clear. I've made a dump of all the requirement strings. there is a tremendous amount of strings with extras; nothing in particular, there are "tests" and "docs" and a hundred of software like common database and tools that would be expected in extra.

morotti avatar Mar 20 '25 14:03 morotti

I agree that the memory difference is small enough that it's not really measurable within the margin of error.

Here's a snapshot on main branch for a large resolution just before it completes:

memray-0

And here's a snapshot on this branch:

memray-2

This is sample profiling so don't read too deeply into some of the differences. But in broad strokes, when connecting to a JSON Simple API the memory is dominated by raw_decode in json.decoder, and the object relevant to this PR is __init__ in pip._internal.req.req_install and there's only a small memory difference.

notatallshaw avatar Mar 27 '25 23:03 notatallshaw

If there is no/little appreciable gain in memory usage due to this change, I'm fine with it. It seems a bit silly to need such a big cache for simply parsing requirement strings, but I'd imagine that reworking pip/packaging to avoid the parsing would be hard/impossible.

ichard26 avatar Mar 28 '25 23:03 ichard26

I agree with @ichard26 here - @notatallshaw you added this to the 25.1 milestone, do you plan on merging it? If not, can you remove the milestone and it can go in when it's ready - there's no real need to tie it to a particular release.

pfmoore avatar Apr 01 '25 10:04 pfmoore

Thanks for the ping, I had been planning to merge.

notatallshaw avatar Apr 01 '25 12:04 notatallshaw