pants
pants copied to clipboard
Triage and improve remote cache hit rate for desktop usage
When a remote cache is configured, desktop usage (even among identical platforms) gets lower cache hit rates than CI usage. This is primarily due to inconsistent PATH/env entries between differently configured boxes.
We should:
- ~add enough information to workunits/metrics to validate that this is the case (i.e. to use a
StreamingWorkunitHandler
to compare cache lookups across different desktop machines)~ (done in #12469) - Do any sort of PATH/env filtering that we need to to improve our hitrate (potentially related to #10526 and #10769)
Some of the differences identified for a series of runs across multiple users:
- PATH/LDFLAGS/CPPFLAGS
- Leaked through in order to allow for compilation of wheels.
- The PEX
--python-path
- The python interpreter used
- Even with an identical PATH string, multiple interpreters might be identified (see #10526 and #10769).
Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?: https://github.com/pantsbuild/pants/blob/61193e1bf5c8d9d3b77519626938906a70d5c098/src/python/pants/backend/python/util_rules/pex.py#L679-L682
Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?: ...
This ticket is intended to cover cases where multiple desktop machines are using the same platform: I'll clarify that.
Another cause for cache-misses we have theorized to cause issues are API keys. Some of our tests run against 3rd party systems and authenticated via an API key. With each developer having a unique key we believe this would effectively nullify any benefits of remote caching.
Given the design of the system, the API key is irrelevant and ignoring any security concerns a singular key could be used by all developers. Or hash(api_key)
could safely return a constant.
I did a little bit of thinking about this before we decided to start #13682, so will braindump some of that for now.
I believe that a path forward to allow for "adjusting" the fingerprint that is included in a Process
for certain environment variables and absolute file args is essentially to reify them into types which:
- Would have their actual string content applied below/after cache lookups. For example:
- a reified
PATH
env var containing yourHOME
directory would not include the HOME portion in theProcess
' digest, but would use the entire value at execution time.
- a reified
- Describe how to compute a deeper/different fingerprint for the entry which was included for cache lookups. For example:
- a reified
PATH
entry might be constructed by the@rule
implementer by listing which (sub)processes they thought that aProcess
would use. Fingerprinting would then collect versions of those processes and apply some rounding to attempt to match.
- a reified
The hope would be that these types would be composable, such that they didn't add much complexity to constructing a Process
.
I enabled remote caching when upgrading to 2.18, and am seeing similar issues -- but even on the same machine by enabling our pants.ci.toml
.
You can see here; that even the fingerprints match, but we get a cache miss when running with only base config:
# WITH pants.ci.toml
11:03:10.24 [DEBUG] remote cache hit for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }
# WITHOUT pants.ci.toml
11:49:55.05 [DEBUG] remote cache miss for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }
This is a pex_binary
that we build that only contains two files and two dependencies. We only write the cache from CI, but I assume if I did write locally it'd at least hit it that, but I don't want all users to write to cache to avoid cache pollution. The only thing that could possibly affect this from our config is that we enable pyenv
in pants.ci.toml
-- we do override the default resolve, but I've made sure to include that on the command line.
I've not dug more into it yet, but I'll diff the actual process invocations when I have time and see if I can prove that it's the Python interpreter location that matters.
For completeness; this is the pants.ci.toml
we use:
[GLOBAL]
colors = true
print_stacktrace = true
plugins.add = [
"hdrhistogram",
]
backend_packages.add = [
"pants.backend.python.providers.experimental.pyenv",
]
remote_cache_write = true
[stats]
log = true
[test]
use_coverage = true
[coverage-py]
report = ["json"]
global_report = true
[pytest]
args = ["-vv", "--no-header", "--benchmark-disable"]
[python]
default_resolve = "cpu"
[oci]
rootless = false
uid_map = ["0:0:65536"]
gid_map = ["0:0:65536"]
[pyenv-python-provider]
installation_extra_env_vars = [
"PYTHON_CONFIGURE_OPTS=--with-lto=thin",
"PYTHON_CFLAGS=-march=native -mtune=native",
]