lava icon indicating copy to clipboard operation
lava copied to clipboard

When ProcessModel crashes via jupyter, parallel Python processes do not get cleaned up

Open awintel opened this issue 4 years ago • 1 comments

During development code often crashes. When this happens in a parallel system Process (involving Python multiprocessing) via jupyter then there are a lot of parallel Python processes left that need to be killed manually.

Interestingly, this does not happen when running the same code from a *.py file (via PyCharm).

This needs to be investigated and fixed.

awintel avatar Nov 22 '21 03:11 awintel

I can confirm that this definitely happens on a windows machine. I think there are stray python processes with low resource usage left behind on every kernel restart in a Jupyter notebook. The problem is more pressing when the process model hangs for various reasons. Usually it would hang because of a bug in the code. However, since Lava doesn't print out the error message that usually the library being used, e.g, numpy, would have printed out, the bug is not apparent. Under these circumstances, a kernel restart would end up leaving a huge chunk of memory being utilized for the stray lava process from the previous kernel. While debugging lava code, this could very quickly fill up the memory (on multiple kernel restarts) and thus lead to seemingly random behavior (i.e. code that worked 5 minutes ago stops working) if the developer is not aware of this issue. The quick fix would be to open task manager and manually kill these stray processes. However, this affects productivity of the developer.

ashishrao7 avatar Jan 19 '22 20:01 ashishrao7