Use sources directly from the build root in `./pants run` and `./pants repl`
As explained in https://github.com/pantsbuild/pants/issues/11109, we run ./pants run and ./pants repl in the build root. However, we use a tmpdir so that files like requirements.pex and codegen files don't get added to the build root.
But running in a tmpdir has some issues:
- It's awkward to access files using filesystem APIs like
with open(), especially for generated files. https://github.com/pantsbuild/pants/issues/11109 - Some scripts assume they are run from the location of the original file, not a chroot.
- For example, Django writes files as siblings to the executed
manage.py, meaning it writes files to the tmpdir, whereas it needs to write them to the normal build root.
- For example, Django writes files as siblings to the executed
This works with this diff:
diff --git a/src/python/pants/core/goals/run.py b/src/python/pants/core/goals/run.py
index 4d9a82313..7e229d987 100644
--- a/src/python/pants/core/goals/run.py
+++ b/src/python/pants/core/goals/run.py
@@ -3,7 +3,6 @@
from abc import ABCMeta
from dataclasses import dataclass
-from pathlib import PurePath
from typing import Iterable, Mapping, Optional, Tuple
from pants.base.build_root import BuildRoot
@@ -21,8 +20,6 @@ from pants.engine.target import (
)
from pants.engine.unions import union
from pants.option.custom_types import shell_str
-from pants.option.global_options import GlobalOptions
-from pants.util.contextutil import temporary_dir
from pants.util.frozendict import FrozenDict
from pants.util.meta import frozen_after_init
@@ -86,7 +83,6 @@ class Run(Goal):
@goal_rule
async def run(
run_subsystem: RunSubsystem,
- global_options: GlobalOptions,
console: Console,
interactive_runner: InteractiveRunner,
workspace: Workspace,
@@ -105,25 +101,23 @@ async def run(
field_set = targets_to_valid_field_sets.field_sets[0]
request = await Get(RunRequest, RunFieldSet, field_set)
- with temporary_dir(root_dir=global_options.options.pants_workdir, cleanup=True) as tmpdir:
- workspace.write_digest(
- request.digest, path_prefix=PurePath(tmpdir).relative_to(build_root.path).as_posix()
- )
-
- args = (arg.format(chroot=tmpdir) for arg in request.args)
- env = {**complete_env, **{k: v.format(chroot=tmpdir) for k, v in request.extra_env.items()}}
- try:
- result = interactive_runner.run(
- InteractiveProcess(
- argv=(*args, *run_subsystem.args),
- env=env,
- run_in_workspace=True,
- )
+ # TODO: somehow track which files were changed from the write so that we can then revert the
+ # changes after `./pants run` is done.
+ workspace.write_digest(request.digest)
+ args = (arg.format(chroot=build_root.path) for arg in request.args)
+ env = {**complete_env, **{k: v.format(chroot=build_root.path)for k, v in request.extra_env.items()}}
+ try:
+ result = interactive_runner.run(
+ InteractiveProcess(
+ argv=(*args, *run_subsystem.args),
+ env=env,
+ run_in_workspace=True,
)
- exit_code = result.exit_code
- except Exception as e:
- console.print_stderr(f"Exception when attempting to run {field_set.address}: {e!r}")
- exit_code = -1
+ )
+ exit_code = result.exit_code
+ except Exception as e:
+ console.print_stderr(f"Exception when attempting to run {field_set.address}: {e!r}")
+ exit_code = -1
return Run(exit_code)
But the main complication is that we need to clean up the temporary files like requirements.pex after the runs are complete. To do this, we need a way to track which files were changed when calling Workspace.write_digest(). Blocked by https://github.com/pantsbuild/pants/issues/12130.
Just pointing out that only item 2 above makes this hard. Without that, you could just set CWD to the build root.
So, to clarify - run and repl both run with the CWD set to the buildroot, as can be easily verified.
However it is true that the PYTHONPATH is modified to point to a copy of the code in a tmpdir, rather than executing directly from the original source.
So only 2 above is a problem. And you can bypass that problem if you control the relevant scripts (Django remains an issue).
I suppose we should ask why we do things the way we do them today. Otherwise it seems like a no-brainer to stop using the tempdir for user code.
But the main complication is that we need to clean up the temporary files like requirements.pex after the runs are complete.
Can we not put temporary files in the tempdir but still run user code from build_root + (PYTHON)PATH manipulation?
Can we not put temporary files in the tempdir but still run user code from build_root + (PYTHON)PATH manipulation?
True...maybe we could? I don't think that will help with codegen files, which we still need to generate into the tmpdir, but at least it would help w/ normal code.
I think we could, we'd want to control where __pycache__ gets written perhaps. Or maybe not.
If we make this change, one difference would be that your entire repo would be available on your PYTHONPATH, rather than just your dependencies. So you might get different behavior than when running a packaged version of your code. So we probably want to support both possibilities, optionally.
For completeness, there's a way around this for first-party code:
- It's easy if you don't have source root. I have a utility code which knows how deep it is in the repo. From that you can calculate the tempdir prefix for the sandbox. Then to get back to the path for a module, strip the tempdir prefix and append the remainder to CWD.
- If you do have a src root, do the same thing, but also add the source root when reconstructing the path.
So you're hardcoding up to 2 "facts", which might be wrong. But the hacks are isolated to one file and everyone else can live in blissful ignorance.
Yeah, the issue though is with scripts you don't control, like Django's makemigrations.
It would be nice if we could use tools like overlayfs here - I'm looking at you MacOS :/.
Related issue: #11109.
Why does this run in a tmp dir under the build root at all? This is confusing because it's hard to leak the tmpdir for inspection. ``--no-process-cleanup` doesn't seem to work for this special tempdir.
Why does this run in a tmp dir under the build root at all?
See the issue I linked above for one reason: codegen. #11109.
Why does this run in a tmp dir under the build root at all?
See the issue I linked above for one reason: codegen. #11109.
I meant as opposed to an ephemeral tempdir like normal rule processes run in. the CWD can still be set to the build root
Why does this run in a tmp dir under the build root at all?
See the issue I linked above for one reason: codegen. #11109.
I meant as opposed to an ephemeral tempdir like normal rule processes run in. the CWD can still be set to the build root
It probably should... InteractiveProcess and Process don't share as much code as they could. That relates to #13852 and #14386 (which @kaos pushed on in #15340). If both were able to use prepare_workdir under the hood, a few fixes would fall out.
Yeah, that's an unfortunate anomaly. And --no-process-cleanup should work, it's a bug if not. Looking now.
Adding this ticket and https://github.com/pantsbuild/pants/issues/11109 to onboarding - it's a thorn when trying to get Pants to run scripts you already had fwict.
I'll add too that placing the tempdir under the build root leads to unexpected side effects, like having tools find repo root configs when traversing the path upwards.
This is a huge pain point for the integration tests in this repo, the repo configs are being loaded for integration tests which pretend to be sandboxed :angry:
To be clear, the tmpdir for general sandboxed processes is not under the build root. But we have a separate tmpdir for the run and repl sandboxes, and that is deliberate, although I forget exactly why.
This turned out to be easy to implement, so PR coming soon.
https://github.com/pantsbuild/pants/pull/15689 implements for run, not for repl which can be dealt with later.
This issue has been open for over one year without activity and is not labeled as a bug. It has been labeled as stale to invite any further updates. If you can confirm whether the issue is still applicable to the latest version of Pants, appears in other contexts, or its priority has changed, please let us know. Please feel free to close this issue if it is no longer relevant.