typeshed Testing third-party stubs in isolated environments

trafficstars

At the moment, whenever there is change to typeshed, we test all stdlib and third-party packages in the same testing environment, using the --custom-typeshed-dir so that all packages can see each other. There are several practical problems with that:

We can't test the requires fields for correctness; all packages are always available.
We can't have packages that depend on external packages, see for example #5768, #5618, #5847, and the discussion in #5769 (all related to cryptography). (Except if we'd just install all non-types dependencies for all package into our venv.)
It doesn't scale if we increase the number of third-party packages in typeshed.
Ideas like per-package configuration is not possible at the moment, see for example #1526.

My suggestion: Only run the tests for the third-party stubs that have actually changed, and run each test in a separate environment. This environment only has the requirements from METADATA.toml installed. This fixes the problems above and also means:

The stubs will not be able to use features from stdlib that are not yet in the latest type checker distributions.
Changes that depend on changes in another third-party stub need to wait until that change has been released. (Happens every three hours, so it shouldn't be too much of a problem.)
Running the tests for all stubs will be slower, since each package needs its own venv installation.

Aug 23 '21 14:08 srittau

A possible downside: If the tests of a third-party stub break for some reason, e.g. because a new version of the corresponding non-stub package is released, the problem will remain unnoticed until someone makes a PR for that specific stub package. I remember most pull requests showing red CI because of some Pillow thing, but can't find it from the PR history now.

Aug 23 '21 17:08 Akuli

It would be useful to run full tests at a regular interval, say once per day or week.

Aug 23 '21 18:08 srittau

Note the stubtest third party code already does this venv creation. We could steal that code / maybe use some fancy Github Actions CI caching to cache venvs across workflows. Switching to testing only changed third party stubs is probably a better idea than just doing everything and sharding, which is the approach test_stubtest currently takes.

Like you say, I'd want to think through how we manage different stdlib and types-* versions; to that goal, here are some previous issues we've had on those lines: https://github.com/python/typeshed/issues/4815 https://github.com/python/typeshed/issues/5786 https://github.com/python/typeshed/issues/5751

Aug 23 '21 21:08 hauntsaninja

This seems to be done now.

Dec 05 '21 20:12 Akuli

No, we still need to use separate venvs for each distribution, so that the dependencies don't interfere with each other.

Dec 06 '21 08:12 srittau

I would be interested in working on this mainly to make it possible to have non-types dependencies.

Would reasonable steps in order be,

Adjust each check to only run for changed folders. Start with mypy then continue to pyright then pytype.
Create a new venv for each folder checked. Again mypy -> pyright -> pytype
Add support for reading metadata.toml and install dependencies for given folder.
Add support for per package mypy.ini/pyrightconfig.json/etc files.

Any major tasks I'm missing?

Feb 20 '22 08:02 hmc-cs-mdrissi

That sounds reasonable. Some thoughts:

It might make sense to split each test up into running on the stdlib vs running on individual third party distributions (like we do with stubtest)
Here's some code for venv creation: https://github.com/python/typeshed/blob/823592e100392747ce9d89b56ead80a1b720d4c3/tests/stubtest_third_party.py#L39
We'd want to make sure that other stubs don't interfere, e.g. we should not be picking up cryptography-stubs if someone declares a cryptography dep. The various type checkers probably each handle this differently
We talked about only having a small, vetted list of deps. If this is the case, I personally wouldn't hate getting started on https://github.com/python/typeshed/issues/5768 with a global venv. This might be controversial though :-)
It might be nice to use different keys in pyproject.toml for typeshed deps vs other deps (e.g., could make automation in the future easier / avoid mistakes where we assume that "types-*" packages are trusted). Might need a change in https://github.com/typeshed-internal/stub_uploader
There's an @tests folder you can reuse for per-distribution config files. We even store some extra requirements.txt's in there

Feb 20 '22 08:02 hauntsaninja

@hmc-cs-mdrissi For now you could try to skip step 1 and we can see how that affects CI runtimes.

Feb 21 '22 16:02 srittau

As of #9408, third-party stubs with non-types dependencies are now tested with mypy in isolated venvs. The venvs are setup concurrently using a threadpool, meaning we still test all typeshed stubs packages in every run, but the script remains performant.

(Stubs packages with no non-types dependencies are not tested in a separate venv, but they are tested using the --no-site-packages flag when they are tested, so are still run in an isolated environment.)

Closing as completed! 🥳

Jan 08 '23 13:01 AlexWaygood

typeshed typeshed copied to clipboard

Testing third-party stubs in isolated environments

typeshed
typeshed copied to clipboard