Breaking in 0.12: tracee's that call wait(2)
In our build tracing tool (https://github.com/travitch/build-bom), upgrading to pete 0.12 causes the process to hang, whereas 0.11 runs as expected. This is likely due to the changes in #102 that went into 0.12.
The build-bom tool runs a build process that is monitored via pete. If the build process is make, then the latter operates by running sub-processes and then calling wait(2) to wait on their completion before performing the next action. This wait call never seems to complete when we upgraded to pete 0.12. This can be demonstrated by the test_blddir test for build_bom (the last test in the test_bom.rs file there). By modifying the test to run with the -d debug flag, we observe the pete 0.12 run stops on the first attempted make action:
...
Finished prerequisites of target file 'blddir/obj'.
Must remake target 'blddir/obj'.
make: Entering directory '/tmp/nix-shell.Nztet5/.tmpTcLAfb/blddir_test'
Makefile:16: update target 'blddir/obj' due to: target does not exist
mkdir -p blddir/obj
Putting child 0x474d00 (blddir/obj) PID 1852635 on the chain.
Live child 0x474d00 (blddir/obj) PID 1852635
with no further output and here the test hangs. Comparatively with pete 0.11 we can see it continue past that point:
...
Finished prerequisites of target file 'blddir/obj'.
Must remake target 'blddir/obj'.
make: Entering directory '/tmp/nix-shell.Nztet5/.tmp3Nd8RL/blddir_test'
Makefile:16: update target 'blddir/obj' due to: target does not exist
mkdir -p blddir/obj
Putting child 0x4742a0 (blddir/obj) PID 1830875 on the chain.
Live child 0x4742a0 (blddir/obj) PID 1830875
Reaping winning child 0x4742a0 PID 1830875
Removing child 0x4742a0 PID 1830875 from chain.
Successfully remade target file 'blddir/obj'.
Considering target file 'headers/target.h'.
... [more output...]
Using strace on the pete 0.12 version:
...
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ffff71d
9000
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7ffff71d9000, stack_size=0x9000
}, 88) = 1855335
munmap(0x7ffff71d9000, 36864) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(1855335,
Conjecture as to cause:
We suspect that the subprocess run by make is not in the tracee list at https://github.com/ranweiler/pete/pull/102/files#diff-862cffa434b0d152dd6cd08f8eb0e84105690698957786052056808fd65b6667R439, thus causing the loop at https://github.com/ranweiler/pete/pull/102/files#diff-862cffa434b0d152dd6cd08f8eb0e84105690698957786052056808fd65b6667R474 to never exit, whereas the broader waitpid call in the previous version would have accepted the pid for the make subprocess and continued to the following code that would allow the make process to proceed.
Sorry for the delay, I don't seem to be getting emails about new issues on this repo. Will dig into this over the weekend.