typeshed
typeshed copied to clipboard
Testing third-party stubs in isolated environments
At the moment, whenever there is change to typeshed, we test all stdlib and third-party packages in the same testing environment, using the --custom-typeshed-dir so that all packages can see each other. There are several practical problems with that:
- We can't test the
requiresfields for correctness; all packages are always available. - We can't have packages that depend on external packages, see for example #5768, #5618, #5847, and the discussion in #5769 (all related to
cryptography). (Except if we'd just install all non-types dependencies for all package into our venv.) - It doesn't scale if we increase the number of third-party packages in typeshed.
- Ideas like per-package configuration is not possible at the moment, see for example #1526.
My suggestion: Only run the tests for the third-party stubs that have actually changed, and run each test in a separate environment. This environment only has the requirements from METADATA.toml installed. This fixes the problems above and also means:
- The stubs will not be able to use features from stdlib that are not yet in the latest type checker distributions.
- Changes that depend on changes in another third-party stub need to wait until that change has been released. (Happens every three hours, so it shouldn't be too much of a problem.)
- Running the tests for all stubs will be slower, since each package needs its own venv installation.
A possible downside: If the tests of a third-party stub break for some reason, e.g. because a new version of the corresponding non-stub package is released, the problem will remain unnoticed until someone makes a PR for that specific stub package. I remember most pull requests showing red CI because of some Pillow thing, but can't find it from the PR history now.
It would be useful to run full tests at a regular interval, say once per day or week.
Note the stubtest third party code already does this venv creation. We could steal that code / maybe use some fancy Github Actions CI caching to cache venvs across workflows. Switching to testing only changed third party stubs is probably a better idea than just doing everything and sharding, which is the approach test_stubtest currently takes.
Like you say, I'd want to think through how we manage different stdlib and types-* versions; to that goal, here are some previous issues we've had on those lines: https://github.com/python/typeshed/issues/4815 https://github.com/python/typeshed/issues/5786 https://github.com/python/typeshed/issues/5751
This seems to be done now.
No, we still need to use separate venvs for each distribution, so that the dependencies don't interfere with each other.
I would be interested in working on this mainly to make it possible to have non-types dependencies.
Would reasonable steps in order be,
- Adjust each check to only run for changed folders. Start with mypy then continue to pyright then pytype.
- Create a new venv for each folder checked. Again mypy -> pyright -> pytype
- Add support for reading metadata.toml and install dependencies for given folder.
- Add support for per package mypy.ini/pyrightconfig.json/etc files.
Any major tasks I'm missing?
That sounds reasonable. Some thoughts:
- It might make sense to split each test up into running on the stdlib vs running on individual third party distributions (like we do with stubtest)
- Here's some code for venv creation: https://github.com/python/typeshed/blob/823592e100392747ce9d89b56ead80a1b720d4c3/tests/stubtest_third_party.py#L39
- We'd want to make sure that other stubs don't interfere, e.g. we should not be picking up cryptography-stubs if someone declares a cryptography dep. The various type checkers probably each handle this differently
- We talked about only having a small, vetted list of deps. If this is the case, I personally wouldn't hate getting started on https://github.com/python/typeshed/issues/5768 with a global venv. This might be controversial though :-)
- It might be nice to use different keys in pyproject.toml for typeshed deps vs other deps (e.g., could make automation in the future easier / avoid mistakes where we assume that "types-*" packages are trusted). Might need a change in https://github.com/typeshed-internal/stub_uploader
- There's an
@testsfolder you can reuse for per-distribution config files. We even store some extra requirements.txt's in there
@hmc-cs-mdrissi For now you could try to skip step 1 and we can see how that affects CI runtimes.
As of #9408, third-party stubs with non-types dependencies are now tested with mypy in isolated venvs. The venvs are setup concurrently using a threadpool, meaning we still test all typeshed stubs packages in every run, but the script remains performant.
(Stubs packages with no non-types dependencies are not tested in a separate venv, but they are tested using the --no-site-packages flag when they are tested, so are still run in an isolated environment.)
Closing as completed! 🥳