Allow for automatic use of pytest-xdist via `Process.concurrency_available`
With the Process.concurrency_available facility, a process can hint at how much concurrency Pants should attempt to allocate for it. This would allow for safe automatic usage of pytest-xdist (behind a flag, and probably optionally disabled via a per-test Field), by passing pytest -n {pants_concurrency}.
This relates to #14886 as well: assuming that you had enough tests to saturate your cores initially, you might end up with a large single test file still running after the others had finished. That issue discusses dynamically scaling up for a straggler process.
I would be interested in this as well. Parallelizing by test rather than test file currently gives us much better performance. Yes, we could do the manual effort of splitting out long running files, but it'd be less work to have the option of setting pants to work like this.
+1, also interested.
While it is a best practice to separate modularize tests such that any test file in itself is not too large, different pieces of code with varying degrees of complexity will have tests that vary in performance.
It feels that the Pants tests execution layer should be robust enough to handle this without requiring the developer to modify the way they write tests.
I think that to do this, you would:
- Add a boolean option to the
pytestsubsystem to control whether it is enabled, similar to this one. It will likely need to be disabled by default, since concurrency can change test behavior. - Add
pytest-xdistto thedefault_extra_requirementsofpytest. - When the flag is enabled, pass an (initially hardcoded... i.e. to
4)VenvPexProcess(.., concurrency_available=4)value for the pytest process, and add an-n {concurrency_available}argument forpytest(similar to whatblackdoes here). - Once that's working, add a
@_rule_helperin the same file that will use a heuristic to choose how many threads to use for a particular file (i.e., what to pass toconcurrency_available). This could probably be as simple as a regex that matchesdef testand a few other common test-method-name patterns.- This will need to take only the test sources, which will be similar to https://github.com/pantsbuild/pants/blob/fca8babc1543444067abec9efe753d72a1ee27d2/src/python/pants/backend/python/goals/pytest_runner.py#L202-L204, but for only the
TestSetupRequesttarget which the method is running for. - It will then need to use
await Get(DigestContents, Digest, source_files.snapshot.digest)to load the content of the test file. - Finally, match a regex/etc against the content of the source files in the
DigestContents.
- This will need to take only the test sources, which will be similar to https://github.com/pantsbuild/pants/blob/fca8babc1543444067abec9efe753d72a1ee27d2/src/python/pants/backend/python/goals/pytest_runner.py#L202-L204, but for only the
- Add an
IntFieldto thepython_teststarget (similar to this field), which will allow for overriding the concurrency detected for a particular test.
To test this, you'll probably want to add a test here, which will really just be a "coverage" test: you won't be able to test that concurrency is actually used, but you can test that nothing fails with the option enabled.