pip
pip copied to clipboard
Switch default pip install compile option to `False`
What's the problem this feature will solve?
Upon investigating install performance sometimes compiling of Python files takes up a significant amount of the time.
Describe the solution you'd like
Switch the default pip install option from True to False, users who still need it can use --compile CLI, config, or env variables. But users who don't need it will save time, CPU, and disk space.
Alternative Solutions
My initial idea was to use compile_dir multiprocess capability, but on running benchmarks I found the time save to be very platform specific. On Linux you only need to be compiling ~15 average Python files to see a statistically significant improvement in time, but on Windows you need ~200 average Python files to see a statistically significant improvement in time.
Even then, while it would save time it would not save CPU or disk space.
Additional context
This is already the default behaviour of both uv and poetry (see --compile).
But there may be some history to why pip makes this the default behaviour that I am not aware of.
Code of Conduct
- [X] I agree to follow the PSF Code of Conduct.
pip compiles by default because not compiling results in permanently slow programs when ordinary users don't have write access to the installation directory.
As opt-in tools, poetry and uv have a bit more leeway to make assumptions that they're being used by developers for development (i.e. installing to folders that are read/write at both installation and execution time), and require the use of opt-in CLI options when they're instead being used for deployments to directories that are read-only at runtime.
pipcompiles by default because not compiling results in permanently slow programs when ordinary users don't have write access to the installation directory.
I understand this reasoning pre-Python 3.8, but can't users now set the path of where they want their pyc cache to be: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPYCACHEPREFIX ?
can't users now set the path of where they want their pyc cache to be
I suspect the number of users who do this is vanishingly small. The key question here is whether we want to optimise the experience for inexperienced users or for experts. And I think it's fair to say that because pip is bundled with CPython, all of the inexperienced Python users will be using pip. You pretty much have to be an experienced user in the first place to be using uv or poetry. So we have significantly more responsibility for supporting inexperienced users than other tools do.
Maybe defaulting to --no-compile when installing into a virtual environment is reasonable, on the basis that (a) it's likely that a virtualenv will be writeable by the user running code in it, and (b) using a virtual environment is itself an indication that the user is somewhat less of a newcomer.
This is a fair point about experienced vs inexperienced. Though, at least anecdotally, I don't think even most experienced users realise that pip compiles Python files by default.
Going back to the user experience topic (https://github.com/pypa/pip/issues/12712) I think it would make sense to be make it clear pip is "installing and compiling" when compile is True, then experienced users reading the pip output have a chance to realise they might not want that. I'll make a post over there.
Maybe defaulting to --no-compile when installing into a virtual environment is reasonable
I would personally be against this as:
- It makes too many assumptions about a users workflow
- "What counts as a virtual environment?" is hard, there's a spec one can follow, but as uv quickly found out when they attempted to follow this spec, the real world ecosystem is more difficult, e.g. should one count a conda environment as a virtual environment?
- It makes debugging and support more difficult, as it makes behaviour more environment state dependent
I’d say it feels like showing a message (or even a progress bar) for the compilation phase would be a good idea. If the user notices it’s slow, they would try to look for (and be able to find) the way to disable it.
Great, I think there’s a consensus is reached here, I don't want to keep this issue open for the sake of it.
If someone else asks about compiling by default they can be pointed back here, or they can propose why things in the future have changed and it should be reconsidered.
Reopening issue based on feedback from https://github.com/pypa/pip/pull/13247#issuecomment-2688376396
As a user of pip, this feels like a loss to me. The effect of this proposal will just be a shifting of costs, so that pip install gets faster but then the first import (or possibly every import, in the worst case) of an installed module gets slower by an equal amount. The only time that changing this default would seem to benefit the user is if they're installing a large number of modules that they never import.
The only time that changing this default would seem to benefit the user is if they're installing a large number of modules that they never import.
I'm not particularly for or against this proposal, but I do think this is a common scenario for two reasons:
- Popular packages often contains many modules and a user of a package will not use all the functionality
- Packages with a large number of transitive dependencies have this effect scale super-linarily, as the package you're using may only be dependent on a small bit of functionality of one it's dependencies, and that transitive dependency may have dependency requirements for functionality not used by you at all, therefore you may be installing packages, that are transitive dependencies, where you are running 0% of the code
+1 to godlygeek, especially since pip has the opportunity to net save time with PRs like #13247 (since the Python interpreter will compile serially)