typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

Testing third-party stubs in isolated environments

Open srittau opened this issue 2 years ago • 8 comments

At the moment, whenever there is change to typeshed, we test all stdlib and third-party packages in the same testing environment, using the --custom-typeshed-dir so that all packages can see each other. There are several practical problems with that:

  • We can't test the requires fields for correctness; all packages are always available.
  • We can't have packages that depend on external packages, see for example #5768, #5618, #5847, and the discussion in #5769 (all related to cryptography). (Except if we'd just install all non-types dependencies for all package into our venv.)
  • It doesn't scale if we increase the number of third-party packages in typeshed.
  • Ideas like per-package configuration is not possible at the moment, see for example #1526.

My suggestion: Only run the tests for the third-party stubs that have actually changed, and run each test in a separate environment. This environment only has the requirements from METADATA.toml installed. This fixes the problems above and also means:

  • The stubs will not be able to use features from stdlib that are not yet in the latest type checker distributions.
  • Changes that depend on changes in another third-party stub need to wait until that change has been released. (Happens every three hours, so it shouldn't be too much of a problem.)
  • Running the tests for all stubs will be slower, since each package needs its own venv installation.

srittau avatar Aug 23 '21 14:08 srittau

A possible downside: If the tests of a third-party stub break for some reason, e.g. because a new version of the corresponding non-stub package is released, the problem will remain unnoticed until someone makes a PR for that specific stub package. I remember most pull requests showing red CI because of some Pillow thing, but can't find it from the PR history now.

Akuli avatar Aug 23 '21 17:08 Akuli

It would be useful to run full tests at a regular interval, say once per day or week.

srittau avatar Aug 23 '21 18:08 srittau

Note the stubtest third party code already does this venv creation. We could steal that code / maybe use some fancy Github Actions CI caching to cache venvs across workflows. Switching to testing only changed third party stubs is probably a better idea than just doing everything and sharding, which is the approach test_stubtest currently takes.

Like you say, I'd want to think through how we manage different stdlib and types-* versions; to that goal, here are some previous issues we've had on those lines: https://github.com/python/typeshed/issues/4815 https://github.com/python/typeshed/issues/5786 https://github.com/python/typeshed/issues/5751

hauntsaninja avatar Aug 23 '21 21:08 hauntsaninja

This seems to be done now.

Akuli avatar Dec 05 '21 20:12 Akuli

No, we still need to use separate venvs for each distribution, so that the dependencies don't interfere with each other.

srittau avatar Dec 06 '21 08:12 srittau

I would be interested in working on this mainly to make it possible to have non-types dependencies.

Would reasonable steps in order be,

  1. Adjust each check to only run for changed folders. Start with mypy then continue to pyright then pytype.
  2. Create a new venv for each folder checked. Again mypy -> pyright -> pytype
  3. Add support for reading metadata.toml and install dependencies for given folder.
  4. Add support for per package mypy.ini/pyrightconfig.json/etc files.

Any major tasks I'm missing?

hmc-cs-mdrissi avatar Feb 20 '22 08:02 hmc-cs-mdrissi

That sounds reasonable. Some thoughts:

  • It might make sense to split each test up into running on the stdlib vs running on individual third party distributions (like we do with stubtest)
  • Here's some code for venv creation: https://github.com/python/typeshed/blob/823592e100392747ce9d89b56ead80a1b720d4c3/tests/stubtest_third_party.py#L39
  • We'd want to make sure that other stubs don't interfere, e.g. we should not be picking up cryptography-stubs if someone declares a cryptography dep. The various type checkers probably each handle this differently
  • We talked about only having a small, vetted list of deps. If this is the case, I personally wouldn't hate getting started on https://github.com/python/typeshed/issues/5768 with a global venv. This might be controversial though :-)
  • It might be nice to use different keys in pyproject.toml for typeshed deps vs other deps (e.g., could make automation in the future easier / avoid mistakes where we assume that "types-*" packages are trusted). Might need a change in https://github.com/typeshed-internal/stub_uploader
  • There's an @tests folder you can reuse for per-distribution config files. We even store some extra requirements.txt's in there

hauntsaninja avatar Feb 20 '22 08:02 hauntsaninja

@hmc-cs-mdrissi For now you could try to skip step 1 and we can see how that affects CI runtimes.

srittau avatar Feb 21 '22 16:02 srittau

As of #9408, third-party stubs with non-types dependencies are now tested with mypy in isolated venvs. The venvs are setup concurrently using a threadpool, meaning we still test all typeshed stubs packages in every run, but the script remains performant.

(Stubs packages with no non-types dependencies are not tested in a separate venv, but they are tested using the --no-site-packages flag when they are tested, so are still run in an isolated environment.)

Closing as completed! 🥳

AlexWaygood avatar Jan 08 '23 13:01 AlexWaygood