pants icon indicating copy to clipboard operation
pants copied to clipboard

Add an intrinsic for capturing Snapshots from absolute paths

Open stuhood opened this issue 3 years ago • 3 comments

There are a few usecases for "absolute" file watching+capturing: #10769, #10360, #10837, etc. A likely end-user API would be to add PathGlobsAndRoot->(Paths|Digest) intrinsics.


Currently Pants' file watching+capturing intrinsics (PathGlobs->(Paths|Digest)) operate relative to the buildroot, for two reasons:

  1. the vast majority of captured files will/should be located inside the buildroot, and that should be the happy path
  2. we used to use watchman for file watching, and it did not (easily) support watching files at a more fine-grained level than the buildroot

The second point is now somewhat historical: since we switched to the notify crate, we can more easily watch more locations. But OSX still places bounds on how many locations you might reasonably watch (see), and so it's possible that:

  1. the API should be constrained to ensure that we don't end up watching too many directories
  2. we should add polling, but only for paths outside the buildroot on OSX

The paths that would be explored via #10769 in particular will generally involve chains of symlinks: PathGlobs expansion is aware of those symlinks (and additionally tries to traverse their parents...), and the result would be watches installed in various places throughout /etc, /usr, and ... etc. It's possible that the total number of watches would be small enough that this would be a non-issue though.

stuhood avatar Sep 23 '20 18:09 stuhood

#10870 describes a method for making mutable caches remote-friendly. In that issue, it currently describes placing a digest_hint file to avoid having to re-snapshot the entire cache each time. However, in #10864, I realized there are use cases (like the MyPy non-append-only cache) which need to re-snapshot the cache dir anyway.

I think that using polling (or the notify crate) for directories outside the buildroot as you've described here is likely a better approach than placing digest_hint files everywhere, and could solve the same problem of making mutable caches remote-friendly. I am aware we already have a method to solve that with platform properties, but it requires support from the remexec backend which doesn't work yet.

In summary, I think that you've described a capability which could be extended to make the existing mutable cache feature remote-friendly without upstream support, and is I think better more generally useful than #10870.

I would also possibly add #10864 to the list of issues this could fix then -- while parenting the MyPy daemon would solve the local execution case, we could also keep track of the MyPy cache directory to make it remote-friendly.

cosmicexplorer avatar Sep 27 '20 16:09 cosmicexplorer

Both our existing pyenv scraping and the upcoming ASDF integration at #12028 should be using this API, but aren't.

stuhood avatar Jul 13 '21 16:07 stuhood

This is likely related to #16800, in service of #13682.

In the context of the work on #13682, the intrinsic described on this ticket would be environment-specific: i.e., when in a __local__ environment, it would execute directly against the filesystem. But when in a docker or remote environment, it would execute inside the image (using whichever implementation was most efficient for that case).

stuhood avatar Sep 08 '22 20:09 stuhood