pants
pants copied to clipboard
'./pants fmt ::' with black leaves orphaned multiprocessing processes when retried due to filesystem changes
Describe the bug
We use pants on a moderate Python mono-repo. It's large enough that ./pants fmt ::
takes a bit of time to run black. If we touch a file during that time (even without making changes, just updating the mtime), black retries (yay), but seems to leave a handful of multiprocessing background workers around forever, orphaned and never exiting. These processes build up and can exhaust machine resource limits.
(This is exacerbated by #16727, since that issue means that a repo may have black running in the background constantly, and thus makes it incredibly likely that some of those invocations will be interrupted.)
Reproducer: https://gist.github.com/huonw/4718b6d146634a66fe9c5e906d76e501
git clone [email protected]:4718b6d146634a66fe9c5e906d76e501.git
cd 4718b6d146634a66fe9c5e906d76e501
./run-it.sh
That script creates a whole lot of files so that black runs for a while, and then touches a file while the ./pants fmt ::
invocation runs. It runs pgrep -fl pants
to show the pants processes running before/after. I ran pkill -fl pants
first, and the output was something like:
generating a bunch of files so that black runs for a while
18:35:21.74 [INFO] Initializing scheduler...
18:35:21.85 [INFO] Scheduler initialized.
18:35:25.08 [WARN] Completed: Format with Black - black made changes.
files/f001.py
+ black made changes.
all clean, just pantsd
16572 pantsd [/private/tmp/4718b6d146634a66fe9c5e906d76e501] PANTS_BIN_NAME=./pants PANTS_DAEMON_ENTRYPOINT=pants.pantsd.pants_daemon:launch_new_pantsd_instance PYTHONPATH=/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/pants.WxzanY/install/bin:/Users/huon/.pyenv/versions/3.9.10/lib/python39.zip:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9/lib-dynload:/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/2.14.0rc0_py39/lib/python3.9/site-packages ptr_munge= main_stack=
18:35:28.69 [WARN] Completed: Format with Black - black made changes.
files/f482.py
files/f483.py
...
files/f479.py
files/f480.py
files/f481.py
+ black made changes.
processes hanging around
16572 pantsd [/private/tmp/4718b6d146634a66fe9c5e906d76e501] PANTS_BIN_NAME=./pants PANTS_DAEMON_ENTRYPOINT=pants.pantsd.pants_daemon:launch_new_pantsd_instance PYTHONPATH=/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/pants.WxzanY/install/bin:/Users/huon/.pyenv/versions/3.9.10/lib/python39.zip:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9/lib-dynload:/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/2.14.0rc0_py39/lib/python3.9/site-packages ptr_munge= main_stack=
17420 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
17423 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
17424 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
17425 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=43) --multiprocessing-fork
17426 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=48) --multiprocessing-fork
17427 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=53) --multiprocessing-fork
17428 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
17429 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=43) --multiprocessing-fork
give them a bit of time to finish...
16572 pantsd [/private/tmp/4718b6d146634a66fe9c5e906d76e501] PANTS_BIN_NAME=./pants PANTS_DAEMON_ENTRYPOINT=pants.pantsd.pants_daemon:launch_new_pantsd_instance PYTHONPATH=/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/pants.WxzanY/install/bin:/Users/huon/.pyenv/versions/3.9.10/lib/python39.zip:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9:/Users/huon/.pyenv/versions/3.9.10/lib/python3.9/lib-dynload:/Users/huon/.cache/pants/setup/bootstrap-Darwin-arm64/2.14.0rc0_py39/lib/python3.9/site-packages ptr_munge= main_stack=
17420 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
17423 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
17424 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
17425 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=43) --multiprocessing-fork
17426 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=48) --multiprocessing-fork
17427 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=53) --multiprocessing-fork
17428 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
17429 /Users/huon/.cache/pants/named_caches/pex_root/venvs/s/ee77ea11/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=43) --multiprocessing-fork
Note how the 3 separate lists of processes (starting with 16572 pantsd ...
) change:
- before running anything: it's just pantsd
- directly after the
./pants fmt ::
with retry: there's a bunch of extra processes associated with a named cache - 5 seconds later: they're still there
If the touch files/f000.py
is commented out, all of the process lists are the same as the first (just pantsd).
Pants version 2.14.0rc0
OS macOS
Additional info
Thanks for pants!
Mm... sorry for the trouble.
This is because our local process sandbox doesn't use graceful shutdown for processes, and instead kills them immediately with SIGKILL.
We do actually have an implementation of graceful shutdown: https://github.com/pantsbuild/pants/blob/db07a12dd55dd5d9bda5518ac91961720582f7c6/src/rust/engine/process_execution/src/children.rs#L12-L22 ... but it is only used for interactive processes currently.
So it sounds like we should use that code with local process sandbox cleanup, too?
So it sounds like we should use that code with local process sandbox cleanup, too?
Yea.
I've got another use case for graceful shutdown of test
targets in particular. I stand up database fixtures using docker containers during test runs (using pytest
). Under normal circumstances, these are cleaned up when the tests exit, but when the tests restart due to file system changes, it can leave lingering docker containers running.
Pytest naturally handles SIGINT
and runs fixture cleanup by default, so sending SIGINT should resolve my issue with out any additional code changes.
An additional useful feature request would allow adjustable time between the SIGINT and SIGKILL signals. Docker containers can take a while to tear down, so configuring the timeout in the BUILD files may be necessary to make this really work for my use case.