`pants help-all` isn't deterministic
Describe the bug
The output of pants help-all appears to not be deterministic, when run with different pantsd instances (it is deterministic if run with the same pantsd running).
Determinstism is good in general, but this specifically leads to spurious diffs in the documentation sync PRs, which check in the output of pants help-all to the https://github.com/pantsbuild/pantsbuild.org repo (via the sync_docs.yml workflow).
I think this may be (partly) related to the use of sets in the code below (and potentially other places in that file):
https://github.com/pantsbuild/pants/blob/8481389512cc1563ef08077838503769d8e4a4ae/src/python/pants/engine/target.py#L441-L452
These potentially lead to non-deterministic order in class_field_types in that file, which is what is called to fill out a target's field in TargetTypeHelpInfo in help_info_extractor.py.
Reproducer:
cd $(mktemp -d)
cat > pants.toml <<EOF
[GLOBAL]
pants_version = "2.19.0"
EOF
pants --no-pantsd help-all > first.json
pants --no-pantsd help-all > second.json
diff -U3 first.json second.json
Running that gives output like (truncated):
--- first.json 2024-02-27 12:28:23.000000000 +1100
+++ second.json 2024-02-27 12:28:26.000000000 +1100
@@ -13895,8 +13895,12 @@
"used_in_rules": []
},
"pants.core.goals.lint.Batch": {
- "consumed_by_rules": [],
- "dependencies": [],
+ "consumed_by_rules": [
+ "pants.backend.project_info.regex_lint.lint_with_regex_patterns"
+ ],
+ "dependencies": [
+ "pants.core"
+ ],
"dependents": [
"pants.backend.project_info"
],
@@ -13904,15 +13908,11 @@
"is_union": true,
"module": "pants.core.goals.lint",
"name": "Batch",
- "provider": "pants.core",
+ "provider": "pants.backend.project_info",
"returned_by_rules": [],
- "union_members": [
- "Batch"
- ],
- "union_type": null,
- "used_in_rules": [
- "pants.core.goals.lint.lint"
- ]
+ "union_members": [],
+ "union_type": "Batch",
+ "used_in_rules": []
},
"pants.core.goals.lint.Lint": {
"consumed_by_rules": [],
@@ -21164,25 +21164,25 @@
"type_hint": "str | None"
},
{
- "alias": "subprocess_environment_env_vars",
+ "alias": "pex_executable_search_paths",
"default": null,
- "description": "Overrides the default value from the option `[subprocess-environment].env_vars` when this environment target is active.",
+ "description": "Overrides the default value from the option `[pex].executable_search_paths` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
},
{
- "alias": "python_native_code_cpp_flags",
+ "alias": "subprocess_environment_env_vars",
"default": null,
- "description": "Overrides the default value from the option `[python-native-code].cpp_flags` when this environment target is active.",
+ "description": "Overrides the default value from the option `[subprocess-environment].env_vars` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
},
{
- "alias": "pex_executable_search_paths",
+ "alias": "system_binaries_system_binary_paths",
"default": null,
- "description": "Overrides the default value from the option `[pex].executable_search_paths` when this environment target is active.",
+ "description": "Overrides the default value from the option `[system-binaries].system_binary_paths` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
@@ -21196,33 +21196,33 @@
"type_hint": "Iterable[str] | None"
},
{
- "alias": "test_extra_env_vars",
+ "alias": "python_native_code_ld_flags",
"default": null,
- "description": "Overrides the default value from the option `[test].extra_env_vars` when this environment target is active.",
+ "description": "Overrides the default value from the option `[python-native-code].ld_flags` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
},
{
- "alias": "python_native_code_ld_flags",
+ "alias": "python_bootstrap_search_path",
"default": null,
- "description": "Overrides the default value from the option `[python-native-code].ld_flags` when this environment target is active.",
+ "description": "Overrides the default value from the option `[python-bootstrap].search_path` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
},
{
- "alias": "python_bootstrap_search_path",
+ "alias": "python_native_code_cpp_flags",
"default": null,
- "description": "Overrides the default value from the option `[python-bootstrap].search_path` when this environment target is active.",
+ "description": "Overrides the default value from the option `[python-native-code].cpp_flags` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
},
{
- "alias": "system_binaries_system_binary_paths",
+ "alias": "test_extra_env_vars",
"default": null,
- "description": "Overrides the default value from the option `[system-binaries].system_binary_paths` when this environment target is active.",
+ "description": "Overrides the default value from the option `[test].extra_env_vars` when this environment target is active.",
"provider": "pants.core",
"required": false,
"type_hint": "Iterable[str] | None"
Pants version
2.19.0, 2.20.0a0
OS macOS
Additional info
N/A
yep, that returned tuple should be sorted.
This is mostly fixed by the sort, but there are some remaining. All the remaining diffs look to be in "name_to_api_type_info" with types that are provided in multiple places. I can find 2 categories:
-
distinct_union_type_per_subclass, like "pants.core.goals.test.Batch". This is caused bydistinct_union_type_per_subclassnot changing the generated class's__name__, but changing the__qualname__. But the help printer computes (something like) the qualname withf"{api_type.__module__}.{api_type.__name__}", which isn't the same, since we fake the__qualname__but don't change the module. I think we're torn between providing a pointer to a class that exists at a location where it doesn't or vice-versa. I think using__qualname__here makes more sense, giving us all the types with correct info but requiring people to traverse the union. - classes that are provided through 2 different paths, for example
BlackRequest.Batchis provided both by "pants.backend.build_files.fmt.black" and "pants.backend.python.lint.black". I'm not sure that there's a good way around this. We could enforce uniqueness by using fully-qualified identifiers for the keys. Or we could enforce an insertion order so the shadowing is deterministic, although the shadowing seems like a problem.