Distributed-CellProfiler icon indicating copy to clipboard operation
Distributed-CellProfiler copied to clipboard

Print easier to read memory used metrics?

Open ErinWeisbart opened this issue 2 years ago • 2 comments

If your dockers run out of memory jobs fail silently. It's annoying. Our per-instance logs do regularly print instance metrics that include memory in use and memory available metrics. However, parsing them is annoying.

It would be nice if we could add in a regular print statement into the logs that is human readable and reports memory metrics so that one could more easily determine if memory issues are bonking jobs by browsing logs. Perhaps also include WARNING in the statement if it's above a certain threshold so that a CloudWatch dashboard widget could easily report it?

ErinWeisbart avatar Sep 25 '23 17:09 ErinWeisbart

The current metric addition was definitely quick and dirty, so definitely could come up with something better, I'm sure it's just a matter of googling the right SO posts.

A warning though about memory - hopefully, most of the time memory issues aren't misconfiguration issues, but when they ARE, our current workflow can't detect it, so let's be thoughtful about how we do/don't describe amount of "available" memory (link below (Broad only))

https://broadinstitute.slack.com/archives/C3QFX04P7/p1642185636020900?thread_ts=1642185636.020900&cid=C3QFX04P7

bethac07 avatar Sep 25 '23 17:09 bethac07

We could also explore whether we want to do the actual agent installation as part of DCP - I doubt it, but if it's optional maybe not a terrible idea

bethac07 avatar Sep 25 '23 17:09 bethac07