pants icon indicating copy to clipboard operation
pants copied to clipboard

`pants run` and `pants package` failing with backtrack due to missing digest

Open engnatha opened this issue 11 months ago • 16 comments

Describe the bug A little hard to capture exactly what was happening. In our CI environment, we were getting a lot of intermittent failures with a similar traceback about a failure to find a process to backtrack for a "missing digest". A repro is elusive here as we couldn't replicate on user accounts.

Here's a sample traceback from when we were had things configured to package a pex binary and then run that binary.

23:36:29.59 [WARN] Making attempt 1 to backtrack and retry `Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<5,>=4.4.0, pika<2,>=1.2.1, protobuf==3.19.5, psutil<6,>=5.8.0, pyjwt<3,>=2, pyserial<4,>=3.4, python-can<4,>=3.1.1, pyudev<1,>=0.24.0, pyyaml<6,>=5.4.1, pyzmq<26,>=25.1, requests<3,>=2.28, setuptools<66,>=65.5.1, tabulate<1,>=0.8.1, typeguard<3,>=2.12.1, types-PyYAML<=6.0.12.6,>=6.0.0, types-click<8,>=7.0.0, typing-extensions<5,>=4.1.1, yubihsm[http]<3,>=2.1`, due to missing digest Digest { hash: Fingerprint<2427f1221b498b758272f1493ffcd7750962e363d32f933b3b80cddcd3d33107>, size_bytes: 14208017 }.
23:36:29.59 [INFO] Dirtying @rule(pants.engine.process.fallible_to_exec_result_or_raise())
23:36:29.59 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.build_pex(astranis.sidekick.automated_remote_io_test/buildbot_robin.pex))
23:36:29.59 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.create_pex())
23:36:29.59 [INFO] Dirtying @rule(pants.backend.python.goals.package_pex_binary.built_pacakge_for_pex_from_targets_request(astranis/sidekick/automated_remote_io_test:buildbot_robin))
23:36:29.59 [INFO] Dirtying @rule(pants.core.goals.package.environment_aware_package())
23:36:29.59 [INFO] Dirtying @rule(pants.core.goals.package.package_asset())
23:36:29.59 [INFO] Dirtying Package
23:36:29.59 [INFO] Filesystem changed during run: retrying `@rule(pants.core.goals.package.package_asset())` in 500ms...
23:36:30.09 [INFO] Starting: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.09 [INFO] Completed: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.11 [INFO] Starting: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.16 [INFO] Completed: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.18 [WARN] Making attempt 1 to backtrack and retry `Building local_dists.pex`, due to missing digest Digest { hash: Fingerprint<fe8697b96cf36d668c6c41fa27b4f06f1202579e313cac5de6cae1a3040572d4>, size_bytes: 607533 }.
23:36:30.18 [INFO] Dirtying @rule(pants.engine.process.fallible_to_exec_result_or_raise())
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.build_pex(local_dists.pex))
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.create_pex(local_dists.pex))
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.util_rules.local_dists.build_local_dists())
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex_from_targets.create_pex_from_targets())
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.create_pex())
23:36:30.18 [INFO] Dirtying @rule(pants.backend.python.goals.package_pex_binary.built_pacakge_for_pex_from_targets_request(astranis/sidekick/automated_remote_io_test:buildbot_robin))
23:36:30.18 [INFO] Dirtying @rule(pants.core.goals.package.environment_aware_package())
23:36:30.18 [INFO] Dirtying @rule(pants.core.goals.package.package_asset())
23:36:30.18 [INFO] Dirtying Package
23:36:30.18 [INFO] Filesystem changed during run: retrying `Process(Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<5,>=4.4.0, pika<2,>=1.2.1, protobuf==3.19.5, psutil<6,>=5.8.0, pyjwt<3,>=2, pyserial<4,>=3.4, python-can<4,>=3.1.1, pyudev<1,>=0.24.0, pyyaml<6,>=5.4.1, pyzmq<26,>=25.1, requests<3,>=2.28, setuptools<66,>=65.5.1, tabulate<1,>=0.8.1, typeguard<3,>=2.12.1, types-PyYAML<=6.0.12.6,>=6.0.0, types-click<8,>=7.0.0, typing-extensions<5,>=4.1.1, yubihsm[http]<3,>=2.1)` in 500ms...
23:36:30.69 [INFO] Starting: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.69 [INFO] Completed: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.73 [INFO] Starting: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.73 [INFO] Completed: Building 23 requirements for astranis.sidekick.automated_remote_io_test/buildbot_robin.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,... (393 characters truncated)
23:36:30.74 [INFO] Starting: Building local_dists.pex
23:36:30.74 [INFO] Completed: Building local_dists.pex
23:36:30.75 [INFO] Starting: Building local_dists.pex
23:36:30.75 [INFO] Completed: Building local_dists.pex
23:36:30.76 [INFO] Starting: Building local_dists.pex
23:36:30.76 [INFO] Completed: Building local_dists.pex
23:36:30.76 [INFO] Starting: Building local_dists.pex
23:36:30.76 [INFO] Completed: Building local_dists.pex
23:36:30.77 [INFO] Starting: Building local_dists.pex
23:36:30.77 [INFO] Completed: Building local_dists.pex
23:36:30.79 [INFO] Starting: Building local_dists.pex
23:36:30.79 [INFO] Completed: Building local_dists.pex
23:36:30.80 [INFO] Starting: Building local_dists.pex
23:36:30.80 [INFO] Completed: Building local_dists.pex
23:36:30.81 [INFO] Starting: Building local_dists.pex
23:36:30.81 [INFO] Completed: Building local_dists.pex
23:36:30.82 [ERROR] 1 Exception encountered:
Engine traceback:
  in `package` goal
IntrinsicError: Could not identify a process to backtrack to for: Missing digest: Was not present in the local store: Digest { hash: Fingerprint<faad51a6a108fba9d40b2a10e82a2646fccbaf8c3d9be47818f4bffae02d94b8>, size_bytes: 4098329 }

An identical error was raised when we ran things with just a pants run command and avoided package altogether.

pants run astranis/sidekick/automated_remote_io_test/buildbot_robin.py -- --sk-dir /buildbot/astranis-monorepo-sk-integration/build/sk
16:01:19.44 [WARN] Making attempt 1 to backtrack and retry `Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<5,>=4.4.0, pika<2,>=1.2.1, protobuf==3.19.5, psutil<6,>=5.8.0, pyjwt<3,>=2, pyserial<4,>=3.4, python-can<4,>=3.1.1, pyudev<1,>=0.24.0, pyyaml<6,>=5.4.1, pyzmq<26,>=25.1, requests<3,>=2.28, setuptools<66,>=65.5.1, tabulate<1,>=0.8.1, typeguard<3,>=2.12.1, types-PyYAML<=6.0.12.6,>=6.0.0, types-click<8,>=7.0.0, typing-extensions<5,>=4.1.1, yubihsm[http]<3,>=2.1`, due to missing digest Digest { hash: Fingerprint<839dc98b8dde47328cbdf3989d52c7e70cc352c955e4a343988e26d60e97fa5f>, size_bytes: 1132584 }.
16:01:19.44 [INFO] Dirtying @rule(pants.engine.process.fallible_to_exec_result_or_raise())
16:01:19.44 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.build_pex(automated_remote_io_test.pex))
16:01:19.44 [INFO] Dirtying @rule(pants.backend.python.util_rules.pex.create_venv_pex())
16:01:19.44 [INFO] Dirtying @rule(pants.backend.python.goals.run_python_source.create_python_source_run_request(astranis/sidekick/automated_remote_io_test/buildbot_robin.py))
16:01:19.44 [INFO] Dirtying @rule(pants.core.goals.run.run())
16:01:19.44 [INFO] Dirtying @rule(pants.core.goals.run.run())
16:01:19.44 [INFO] Dirtying Run
16:01:19.44 [INFO] Dirtying Run
16:01:19.44 [INFO] Filesystem changed during run: retrying `@rule(pants.core.goals.run.run())` in 500ms...
16:01:19.94 [INFO] Starting: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:19.94 [INFO] Completed: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:19.97 [INFO] Starting: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:19.97 [INFO] Completed: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.01 [INFO] Starting: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.01 [INFO] Completed: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.03 [INFO] Starting: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.03 [INFO] Completed: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.05 [INFO] Starting: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.05 [INFO] Completed: Building 23 requirements for automated_remote_io_test.pex from the constraints.txt resolve: aiohttp<4,>=3.8.3, click<9.0,>=8.0, coloredlogs<16,>=15, cryptography<42.0,>=41.0, hvac<1,>=0.11.2, pexpect<... (360 characters truncated)
16:01:20.06 [ERROR] 1 Exception encountered:
Engine traceback:
  in `run` goal
IntrinsicError: Could not identify a process to backtrack to for: Missing digest: Was not present in the local store: Digest { hash: Fingerprint<b88f117ae5cf142f938fdf85baf88ea3862c73a0913ebbec8de24dc07706daff>, size_bytes: 639039 }

Pants version 2.17.0 with the 0.10.0 launcher binary

OS Ubuntu 20.04

Additional info Interestingly, at the recommendation of @huonw , we tried prepending PANTS_LOCAL_STORE_DIR=$(mktemp -d) <cmd> and things worked repeatedly enough when force scheduling builds.

This led us to believe that wiping the cache on that runner would make things happy. Unfortunately, the next build after resetting the cache resulted in the same errors above.

engnatha avatar Sep 01 '23 04:09 engnatha

Not sure if #16667 is related, but we're using fetch for cache_content_behavior. I'd like to avoid uploading the whole config file to avoid looping in security, but let me know if there's any other values that would be interesting to know about.

engnatha avatar Sep 01 '23 04:09 engnatha

@engnatha confirmed on slack (https://pantsbuild.slack.com/archives/C046T6T9U/p1693541874862319?thread_ts=1693439136.489919&cid=C046T6T9U) they use a remote cache (bazel-remote-cache).


I looked over the diff between 2.16.0 and 2.17.0 https://github.com/pantsbuild/pants/compare/release_2.16.0..release_2.17.0 , and particularly the changes to the "core" caching code: git diff --stat release_2.16.0 release_2.17.0 -- src/rust/engine/process_execution/ src/rust/engine/fs/store/ src/rust/engine/sharded_lmdb

From my skim, it looks like the vaguely-interesting changes are:

  1. #18153, and associated work like #18747, #18768, #19113
  2. #18458

As a weak bit of evidence, the digests mentioned in the logs have size 14208017, 607533, 4098329, 1132584, 639039, all of which are larger than the 524288 (512KiB) LARGE_FILE_SIZE_LIMIT limit above which the "FSDB" code in 1 kicks in. I'm going to assign @thejcannon based on that, but feel free to assign onwards if this is incorrect.

Full log of changes to those files
git log --oneline release_2.16.0a0..release_2.17.0 -- src/rust/engine/process_execution/ src/rust/engine/fs/store/ src/rust/engine/sharded_lmdb/

Some of these were likely cherry-picked to 2.16.0. I haven't filtered them out.

8da4c8dbf8 Have Pants create the immutable_inputs base parent(s) (Cherry-pick of #19646) (#19652) 1abb1548ee Don't try and garbage collect tempfiles (Cherry-pick of #19103) (#19113) b8f0ed02b4 Replace Snapshot::_unsafe_create with create_for_testing. (#18995) bee75a98c5 Make DigestSubset symlink-aware (#18963) 07dbcdb9c7 Remove existing entry if any when materialising symlink (#18873) 4e39b72623 Update to tokio 1.28.0 (#18874) dcdd1bcc5c Upgrade tempfile and time crates (#18846) 7e6b5bc988 Use a released version of bollard (#18777) 1aca349611 Get rid of spectral rust crate. (#18774) e87c3c6e9a Flush metadata to disk before renaming immutable store files (#18768) 9616025f1f Clear FSDB materialization cache when removeing a file (#18747) cb01cbefc4 Remove implicit conversion from std::io::Error to StoreError (#18696) 317a09df4a Add timeout to remote store calls, and adjust name of cache timeout. (#18695) 1462bef65a Ensure that sandboxed processes exit before their sandboxes are cleaned up (#18632) c398899968 Only materialize immutable files once per process (#18600) 194862a8c5 Remove safe_create_dir(_all) in favor of stdlib implementation. (#18605) cd832f2797 Improve debug information for docker auth, and expose an environment variable. (#18599) e69beffc6e Set the entire env in __run.sh (#18582) c36a988a96 Materialize "large" files in a new store location and hardlink them in sandboxes (#18153) b23df09279 Add support for Docker registry auth (#18541) 5ce027b03b Bump env_logger from 0.9.0 to 0.10.0 in /src/rust/engine (#17688) 81c2fa3187 Re-enable hardlinking using fclonefileat on macOS. (#18501) 3236b4e2eb Disable hardlinking of immutable inputs (#18487) f8cc00b6dc Finish parallelizing materialization of Process inputs (#18469) 15ecb9e139 Expose environment information on process metadata (#18458) d88e258291 Removing more unwraps from FFI layer (#18370)

huonw avatar Sep 01 '23 04:09 huonw

The FSDB changes might be a red herring, FWIW https://pantsbuild.slack.com/archives/C046T6T9U/p1672247391875929 was in Dec 2022 with a file of 35 bytes. And another in December: https://pantsbuild.slack.com/archives/C046T6T9U/p1671161642895819?thread_ts=1666215970.400249&cid=C046T6T9U

The FSDB changes were merged 5 months ago.

This particular thread from October seems relevant: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1665440874737089?thread_ts=1665440874.737089&cid=C0D7TNJHL and that led to https://github.com/pantsbuild/pants/pull/17204

I'll CC @stuhood and it'll be a waiting game to see who has time first 😈

thejcannon avatar Sep 01 '23 11:09 thejcannon

The fact that this fix went into 2.14, and for 2.15 and 2.16 we don't see it again does hint to me that the FSDB likely either unearthed another bug, or somehow re-introduces the previous one

thejcannon avatar Sep 01 '23 11:09 thejcannon

Does PANTS_LOCAL_STORE_DIR point to the default location in your config? Do you have any temp-file cleaners running on that directory, or anything that might be deleting files?

With cache_content_behavior=fetch (please definitely confirm that that is what you are using!), the only way for this to occur should be for a digest which one codepath claimed existed on disk to later (within the same process) not actually exist. The potential relation to the FSDB changes is that they write different sized blobs to different locations, and so if all codepaths aren't in alignment about where blobs go, one might see it on disk, while another looks in the wrong place and does not.

stuhood avatar Sep 01 '23 18:09 stuhood

We do not override PANTS_LOCAL_STORE_DIR. All of this runs in github actions where we preserve the contents of ~/.pex and ~/.cache/pants.

Confirmed that we have fetch set. No other places I see in our repo. I don't see any environment variables configured to override it either. image

engnatha avatar Sep 05 '23 19:09 engnatha

so if all codepaths aren't in alignment about where blobs go

I have two theories:

  1. there might be codepath that's missing a should_use_fsdb check somewhere (I haven't tried reading the code to check this, yet)
  2. there's a codepath that adds additional conditions to the FSDB size-check one, potentially forcing a too-large digest to end up in LMDB "incorrectly": https://github.com/pantsbuild/pants/blob/68d6979c2f2df01615ac9a0ff7780ddba35abc68/src/rust/engine/fs/store/src/lib.rs#L260

2 is most concrete. However, I believe the check only kicks in for loading a directory from the remote, and while it doesn't seem impossible that Directory could end up larger than 512KiB, it seems kinda implausible? (A single directory seems like it would to have tens of thousands of children, to exceed that?)

pub struct Directory {
     pub files: Vec<FileNode>,
     pub directories: Vec<DirectoryNode>,
     pub symlinks: Vec<SymlinkNode>,
     pub node_properties: Option<NodeProperties>, 
}

(That said, even if 2 is unlikely, it definitely seems like a latent bug, and so should be fixed.)

huonw avatar Sep 09 '23 22:09 huonw

However, I believe the check only kicks in for loading a directory from the remote

Hm, I tried constructing a failing example for this, and couldn't: it seems like Pants will only upload ActionResult, and they only store directory Trees (via OutputDirectory). A Tree incorporates all of its children, and is downloaded but never persisted into the local store by Store::load_tree_from_remote, and there's no further downloading of Directorys... I'm a bit surprised by this, though.

(Structs are references to protobufs in protos::gen::build::bazel::remote::execution::v2)

The example I was working with which creates a 1MB+ "directory" using some really long file names, which I ran with bazel-remote running in the background:

[GLOBAL]
pants_version = "2.17.0"
backend_packages = ["pants.backend.shell"]

remote_store_address = "grpc://localhost:9092"
remote_cache_read = true
remote_cache_write = true

pantsd = false
# bump this to force a fresh local cache:
local_store_dir = "/tmp/19748-backtrack-8"
shell_command(
    name="generate",
    command="""
    mkdir -p foo/bar
    for i in $(seq 1 5000); do
       touch foo/bar/long-12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890-$i
    done
    """,
    tools=["mkdir", "touch", "seq"],
    output_directories=["foo/"],
)

shell_command(
    name="invalidator",
    # bump this to force the archive process to rebuild
    command="echo 5 > file.txt",
    output_files=["file.txt"]
)

archive(name="consume", files=[":generate", ":invalidator"], format="zip")

huonw avatar Sep 10 '23 01:09 huonw

(Turns out directories are never stored in FSDB, so they're probably not a cause of the issue here, i.e. theory 2 in https://github.com/pantsbuild/pants/issues/19748#issuecomment-1712651784 is wrong, and my reproduction attempt above was never going to fail. https://github.com/pantsbuild/pants/pull/19815 makes this more obvious.)

huonw avatar Sep 10 '23 02:09 huonw

Pants' CI has seen (at least) two instances of a similar error recently, but both with small digests (a few KB):

  • https://github.com/pantsbuild/pants/actions/runs/6253984026/job/16980581754?pr=19893#step:9:259 (while building a PEX)
  • https://github.com/pantsbuild/pants/actions/runs/6196613022/job/16824266567?pr=19755#step:8:238 (within the Rust dep inference)
Traceback (most recent call last):
  File "/Users/runner/work/pants/pants/src/python/pants/bin/daemon_pants_runner.py", line 134, in single_daemonized_run
    runner = LocalPantsRunner.create(
  File "/Users/runner/work/pants/pants/src/python/pants/bin/local_pants_runner.py", line 93, in create
    build_config = options_initializer.build_config(options_bootstrapper, env)
  File "/Users/runner/work/pants/pants/src/python/pants/init/options_initializer.py", line 153, in build_config
    return _initialize_build_configuration(self._plugin_resolver, options_bootstrapper, env)
  File "/Users/runner/work/pants/pants/src/python/pants/init/options_initializer.py", line 52, in _initialize_build_configuration
    working_set = plugin_resolver.resolve(options_bootstrapper, env, backends_requirements)
  File "/Users/runner/work/pants/pants/src/python/pants/init/plugin_resolver.py", line 141, in resolve
    for resolved_plugin_location in self._resolve_plugins(options_bootstrapper, env, request):
  File "/Users/runner/work/pants/pants/src/python/pants/init/plugin_resolver.py", line 166, in _resolve_plugins
    session.product_request(ResolvedPluginDistributions, [params])[0],
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/scheduler.py", line 579, in product_request
    return self.execute(request)
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/scheduler.py", line 520, in execute
    self._raise_on_error([t for _, t in throws])
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/scheduler.py", line 504, in _raise_on_error
    raise ExecutionError(
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:

Engine traceback:
  in select
    ..
  in pants.init.plugin_resolver.resolve_plugins
    ..

Traceback (most recent call last):
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 638, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/init/plugin_resolver.py", line 77, in resolve_plugins
    plugins_pex = await Get(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 130, in __await__
    result = yield self
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 638, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/util_rules/pex.py", line 1041, in create_venv_pex
    venv_pex_result = await Get(BuildPexResult, PexRequest, seeded_venv_request)
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 130, in __await__
    result = yield self
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 638, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/util_rules/pex.py", line 720, in build_pex
    result = await Get(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 130, in __await__
    result = yield self
native_engine.IntrinsicError: Could not identify a process to backtrack to for: Missing digest: Was not present in the local store: Digest { hash: Fingerprint<b772cbdee71ef0c558056bc50c9cd0a076567cdc883aede5aa156d7ad6f180d4>, size_bytes: 1287 }
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/core/goals/test.py", line 874, in run_tests
    results = await MultiGet(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 361, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/goals/pytest_runner.py", line 505, in run_python_tests
    setup = await Get(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 118, in __await__
    result = yield self
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/goals/pytest_runner.py", line 256, in setup_pytest_for_target
    transitive_targets, plugin_setups = await MultiGet(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 512, in MultiGet
    return await _MultiGet((__arg0, __arg1))
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/graph.py", line 749, in transitive_targets
    dependency_mapping = await Get(_DependencyMapping, _DependencyMappingRequest(request, True))
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 118, in __await__
    result = yield self
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/graph.py", line 699, in transitive_dependency_mapping
    direct_dependencies = await MultiGet(  # noqa: PNT30: this is inherently sequential
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 361, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/graph.py", line 1326, in resolve_dependencies
    inferred = await MultiGet(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 361, in MultiGet
    return await _MultiGet(tuple(__arg0))
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 168, in __await__
    result = yield self.gets
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
    res = rule.send(arg) if err is None else rule.throw(throw or err)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/dependency_inference/rules.py", line 460, in infer_python_dependencies_via_source
    parsed_dependencies = await _exec_parse_deps(request.field_set, python_setup)
  File "/Users/runner/work/pants/pants/src/python/pants/backend/python/dependency_inference/rules.py", line 365, in _exec_parse_deps
    resp = await Get(
  File "/Users/runner/work/pants/pants/src/python/pants/engine/internals/selectors.py", line 118, in __await__
    result = yield self
native_engine.IntrinsicError: Could not identify a process to backtrack to for: Missing digest: Was not present in the local store: Digest { hash: Fingerprint<610fcb1e611f8f633f3f7dd85df614162b7ed3c767b06b0b3fd2fa698dcd4a9b>, size_bytes: 7902 }

huonw avatar Sep 20 '23 21:09 huonw

Just reporting that we got this again on 2.18.0

engnatha avatar Nov 15 '23 16:11 engnatha

I was seeing this on 2.18.0, but someone at our company realized that we were seeing a lot of "quarantines" by SentinelOne, which our company pre-installs on macs (not linux). I suspect it was seeing pants forking a lot of subprocesses, and thought it was a virus? Not 100% sure why it picked up on pants as a virus. It was doing this quarantining quietly which made it annoying to debug.

We "fixed" it by adding pants + related files to the exclusion list in SentinelOne about a week ago. I've haven't seen this since the "fix" went in, whereas before it'd happen pretty frequently on pants lint ::

trhodeos avatar Jan 11 '24 23:01 trhodeos

Interesting. We also use sentinelone internally.

engnatha avatar Jan 11 '24 23:01 engnatha

Aha! @engnatha can you report back here if this solves things?

benjyw avatar Jan 12 '24 00:01 benjyw

Hi @engnatha , wondering if you have an update on this? I'm seeing something similar locally now and definitely not related to sentinelone...

benjyw avatar Jun 04 '24 15:06 benjyw

We actually haven't seen this in a while. I don't believe our security team ever got around to enabling allow list exceptions for pants-managed files.

engnatha avatar Jun 06 '24 15:06 engnatha