coveragepy icon indicating copy to clipboard operation
coveragepy copied to clipboard

Current approach to restarting data collection?

Open ShaheedHaque opened this issue 1 year ago • 1 comments

Have you asked elsewhere?

The thread in https://stackoverflow.com/a/40518553/6332554 describes a way to "restart" data collection.

Describe your situation

Celery workers do not honour atexit() handlers, see https://github.com/celery/celery/discussions/8923. I'm trying a couple of different approaches to dealing with this, one being to use the Coverage API to dump the collected data as the worker runs. Since the worker runs my tasks, I can "easily" add the code to the end of my task function.

I therefore need incremental collection. I don't see an obvious solution using the current public API, but I do see the auto_load option on the constructor. Can it or the load() method be used to restart collection?

If not, then is the method in the stackoverflow.com exchange still current?

ShaheedHaque avatar Mar 27 '24 08:03 ShaheedHaque

I was able to close #1454 by implementing on-going dumping and restarting data collection. My current approach, in lieu of using the internal APIs used by the stackoverflow thread, is something like this:

    ctx.coverage.stop()
    ctx.coverage.save()
    data_file = <some PID-based name with a monotonically increasing count as a suffix>
    ctx.coverage = _coverage.Coverage(data_file=data_file, data_suffix='cov', cover_pylib=False, config_file=whatever)
    ctx.coverage.start()

This assumes that the garbage collector will clean up after the old instance of ctx.coverage, and has the potential to create a lot of files. Hence the interest in whether auto_load or load() (or indeed some other API) could be useful. Ideally, something like:

    ctx.coverage.stop()
    ctx.coverage.save()
    ctx.coverage.resume()

ShaheedHaque avatar Mar 30 '24 11:03 ShaheedHaque