spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Heap dump/stack trace on OOM logging policy

Open abellina opened this issue 3 years ago • 0 comments

Providing heap dumps and stack traces on GPU OOM are ways to narrow down memory misuse. How many stack traces and heap dumps to output is not a clear choice. Is this debug dumping needed once? Do we want to do this each time an OOM is detected? etc.

This issue is to propose a policy we can use for these debug tools. Ideally the policy is consistent with both so it makes sense to the end user.

abellina avatar Oct 14 '22 19:10 abellina