pants Allow for automatic use of pytest-xdist via `Process.concurrency

With the Process.concurrency_available facility, a process can hint at how much concurrency Pants should attempt to allocate for it. This would allow for safe automatic usage of pytest-xdist (behind a flag, and probably optionally disabled via a per-test Field), by passing pytest -n {pants_concurrency}.

Apr 06 '22 18:04 stuhood

This relates to #14886 as well: assuming that you had enough tests to saturate your cores initially, you might end up with a large single test file still running after the others had finished. That issue discusses dynamically scaling up for a straggler process.

Jul 20 '22 21:07 stuhood

I would be interested in this as well. Parallelizing by test rather than test file currently gives us much better performance. Yes, we could do the manual effort of splitting out long running files, but it'd be less work to have the option of setting pants to work like this.

Jul 21 '22 11:07 stacey-gammon

+1, also interested.

While it is a best practice to separate modularize tests such that any test file in itself is not too large, different pieces of code with varying degrees of complexity will have tests that vary in performance.

It feels that the Pants tests execution layer should be robust enough to handle this without requiring the developer to modify the way they write tests.

Jul 25 '22 16:07 hughhan1

I think that to do this, you would:

Add a boolean option to the pytest subsystem to control whether it is enabled, similar to this one. It will likely need to be disabled by default, since concurrency can change test behavior.
Add pytest-xdist to the default_extra_requirements of pytest.
When the flag is enabled, pass an (initially hardcoded... i.e. to 4) VenvPexProcess(.., concurrency_available=4) value for the pytest process, and add an -n {concurrency_available} argument for pytest (similar to what black does here).
Once that's working, add a @_rule_helper in the same file that will use a heuristic to choose how many threads to use for a particular file (i.e., what to pass to concurrency_available). This could probably be as simple as a regex that matches def test and a few other common test-method-name patterns.
- This will need to take only the test sources, which will be similar to https://github.com/pantsbuild/pants/blob/fca8babc1543444067abec9efe753d72a1ee27d2/src/python/pants/backend/python/goals/pytest_runner.py#L202-L204, but for only the TestSetupRequest target which the method is running for.
- It will then need to use await Get(DigestContents, Digest, source_files.snapshot.digest) to load the content of the test file.
- Finally, match a regex/etc against the content of the source files in the DigestContents.
Add an IntField to the python_tests target (similar to this field), which will allow for overriding the concurrency detected for a particular test.

To test this, you'll probably want to add a test here, which will really just be a "coverage" test: you won't be able to test that concurrency is actually used, but you can test that nothing fails with the option enabled.

Jul 26 '22 18:07 stuhood

Allow for automatic use of pytest-xdist via `Process.concurrency_available`