covr Don't fail if rds files are corrupted

This is a somewhat unsatisfactory PR as I can't replicate this reliably.

As seen in https://github.com/r-lib/covr/issues/451 and https://github.com/r-lib/covr/issues/315 if the R process is killed when writing the Rds file it will create a corrupted rds file that will break the covr::package_coverage(). I've seen this rarely on gha builds (e.g. https://github.com/mrc-ide/rrq/pull/40/checks?check_run_id=2363889569) when using callr::r_bg fairly extensively, but have been unable to trigger in a MWE.

The downside is that the coverage report may be somewhat partial (and a message could be added during reading to indicate that) but continuing despite failure might be more graceful for relatively little additional complexity.

Apr 16 '21 17:04 richfitz

All committers have signed the CLA.

Apr 16 '21 17:04 CLAassistant

I am pretty sure this is due to processx using SIGKILL in a number of places and the finalizer that is writing the coverage is interrupted.

@gaborcsardi would know better than I if there is a way to have the improved behavior for this use case, the current behavior is unfortunate.

Apr 16 '21 17:04 jimhester

Sometimes you have to use sigkill, because R in the subprocess is not interruptible.

Apr 16 '21 17:04 gaborcsardi

Sure, but if the process is writing the coverage out in an exit finalizer the process is already in the process of exiting, so there must be some case where the SIGKILL is too eager.

Apr 16 '21 18:04 jimhester

@richfitz You can add some $signal() calls to your event loop layer if you want a graceful shutdown. processx is a bit too low level to implement this, without a standard event loop.

Or, if you don't mind sequential shutdown, then you can probably add a finalizer to your processes, that sends a SIGTERM first and then after a timeout a SIGKILL.

On Windows there is no SIGTERM, but you can $interrupt() first.

Apr 16 '21 19:04 gaborcsardi

@richfitz You can add some $signal() calls to your event loop layer if you want a graceful shutdown. processx is a bit too low level to implement this, without a standard event loop.

I don't think that this is the high level parts, unfortunately - I think that this is probably where a task has been spawned and is later cleaned up? I can't easily replicate unfortunately, and after replacing all the $kill calls I had (which I see uses SIGKILL after SIGTERM + grace period) I am still seeing intermittent failures - perhaps 1/3 to 1/4 of the time, so enough to be hard to work with but not enough to know if random fixes are helping!

I've not seen anything running covr::package_coverage() locally so I expect this is an issue with the speed at which the process is closed in the context of fairly slow GHA runners?

I do think that the current PR as it stands may be of use, but as always up to you for if you agree :)

Apr 19 '21 09:04 richfitz