pants
pants copied to clipboard
Allow for automatic use of pytest-xdist via `Process.concurrency_available`
With the Process.concurrency_available
facility, a process can hint at how much concurrency Pants should attempt to allocate for it. This would allow for safe automatic usage of pytest-xdist
(behind a flag, and probably optionally disabled via a per-test Field
), by passing pytest -n {pants_concurrency}
.
This relates to #14886 as well: assuming that you had enough tests to saturate your cores initially, you might end up with a large single test file still running after the others had finished. That issue discusses dynamically scaling up for a straggler process.
I would be interested in this as well. Parallelizing by test rather than test file currently gives us much better performance. Yes, we could do the manual effort of splitting out long running files, but it'd be less work to have the option of setting pants to work like this.
+1, also interested.
While it is a best practice to separate modularize tests such that any test file in itself is not too large, different pieces of code with varying degrees of complexity will have tests that vary in performance.
It feels that the Pants tests execution layer should be robust enough to handle this without requiring the developer to modify the way they write tests.
I think that to do this, you would:
- Add a boolean option to the
pytest
subsystem to control whether it is enabled, similar to this one. It will likely need to be disabled by default, since concurrency can change test behavior. - Add
pytest-xdist
to thedefault_extra_requirements
ofpytest
. - When the flag is enabled, pass an (initially hardcoded... i.e. to
4
)VenvPexProcess(.., concurrency_available=4)
value for the pytest process, and add an-n {concurrency_available}
argument forpytest
(similar to whatblack
does here). - Once that's working, add a
@_rule_helper
in the same file that will use a heuristic to choose how many threads to use for a particular file (i.e., what to pass toconcurrency_available
). This could probably be as simple as a regex that matchesdef test
and a few other common test-method-name patterns.- This will need to take only the test sources, which will be similar to https://github.com/pantsbuild/pants/blob/fca8babc1543444067abec9efe753d72a1ee27d2/src/python/pants/backend/python/goals/pytest_runner.py#L202-L204, but for only the
TestSetupRequest
target which the method is running for. - It will then need to use
await Get(DigestContents, Digest, source_files.snapshot.digest)
to load the content of the test file. - Finally, match a regex/etc against the content of the source files in the
DigestContents
.
- This will need to take only the test sources, which will be similar to https://github.com/pantsbuild/pants/blob/fca8babc1543444067abec9efe753d72a1ee27d2/src/python/pants/backend/python/goals/pytest_runner.py#L202-L204, but for only the
- Add an
IntField
to thepython_tests
target (similar to this field), which will allow for overriding the concurrency detected for a particular test.
To test this, you'll probably want to add a test here, which will really just be a "coverage" test: you won't be able to test that concurrency is actually used, but you can test that nothing fails with the option enabled.