ct icon indicating copy to clipboard operation
ct copied to clipboard

gcov cannot write coverage data from child processes

Open kr opened this issue 5 years ago • 4 comments

At the end of a successful test, the test process itself ends normally, but ct kills any remaining child of the test process with SIGKILL. This is a problem if that child is executing code under test, because it means gcov has no chance to write coverage data for that process.

In #12 there's some discussion of how gcov writes coverage data.

kr avatar Jul 08 '19 21:07 kr

@kr, we can close it down since it was fixed in beanstalkd.

As slightly related improvement can I suggest following patch? https://github.com/kr/ct/pull/19

ysmolski avatar Jul 10 '19 07:07 ysmolski

This still seems like a gotcha for other projects using ct, so I'd like to keep it open.

kr avatar Jul 11 '19 01:07 kr

I think the ideal behavior here from ct would be:

  • if the test process finishes the test successfully, it kills all descendants with SIGTERM
  • it waits for them to complete
  • if they don't all complete within some time limit, ct then kills the whole process group with SIGKILL (as it already does)

Some things that make this difficult:

  • We can't do most of this in the test driver process, since we're concerned with children of the test process. The test process itself must be the one to wait.
  • While ct does have the chance to execute logic in the test process after the test itself has returned successfully, it doesn't know what the children are or even how many there are.
  • After sending SIGTERM, the test process isn't doing anything but waiting for its children to exit. Arguably ct could begin another test during this time, but it would take some doing to make that happen, since the driver process currently uses termination of the test process to indicate that the test is done and therefore it's okay to start another.

kr avatar Jul 11 '19 01:07 kr

My rough plan for how to do this happens mostly in start at the end of the child branch. Here's what it does currently:

...              // setup test process state
t->f();          // run the test
if (fail) {
    ctfailnow(); // send SIGABRT to self, indicating failure to driver
}
exit(0);         // indicate success to driver process

I'd like to add some logic immediately before that last line.

  1. Set a timer to call exit(0) after a short timeout. (Maybe 500ms?) This will ensure that the process exits successfully eventually, no matter how long its children take.
  2. Ignore SIGTERM in the current process (the test process). This is okay because this process has nothing left to do but signal its children and wait for them, and we've already ensured it'll exit soon.
  3. Signal the whole current process group with SIGTERM. This will signal all descendants without having to enumerate them. It'll also signal the current process but we just ignored SIGTERM so that's okay.
  4. Call wait in a loop until there are no children.
  5. Finally, exit(0) as before.

If the children exit promptly, these steps will complete in order. If any child takes too long, the timer will fire and the test process will just exit. Either way, the driver process will kill the process group with SIGKILL as it already does, cleaning up any stragglers. (Note that we need this final SIGKILL even if all the immediate children exited promptly, because there could be deeper descendants still running in the process group.)

kr avatar Jul 11 '19 02:07 kr