Memory leak when running Puffin
Apparently Puffin suffers a memory leak. The problem is visible when using periodic (steady state) mode. The most visible example is LCLS run. So far the reason is not clear.
A memory leak issue started appearing first on Ubuntu 18.04. You'll have to check with valgrind or some other tool to track it down, but if it's the same issue, it was occuring with the MPI commands. After some digging, I found that it's an issue with the version of openMPI used in Ubuntu, and IIRC it has not been fixed in the most recent version. Can't remember exactly which MPI commands it was - actually, was it the data writing? The parallel writes may have been the issue - I think it was the low-level MPI file writes in the parallel hdf5 lib, although I might be misremembering. Maybe...try experimenting with the frequency of write steps. I seem to remember jumps in memory consumption happening when the beam jumped from one element to the next, or doing file writes... If it's the same issue, there's nothing much to do except to use MPI/hdf5 versions which don't do this (or intel MPI). If it's not the same issue, then it's been introduced in some other dependency (because of the issue, I thoroughly checked for mem leaks on various platforms and libs, and apart from on the Ubuntu 18.04 box it was rock solid), and analysis tools will help a lot in tracking it down.
On Wed, 30 Jan 2019, 23:14 Piotr Traczykowski, [email protected] wrote:
Assigned #77 https://github.com/UKFELs/Puffin/issues/77 to @mightylorenzo https://github.com/mightylorenzo.
— You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub https://github.com/UKFELs/Puffin/issues/77#event-2107997442, or mute the thread https://github.com/notifications/unsubscribe-auth/AC6VvduxUWa6qaoRVaZUyk-oYGDhJKpPks5vIidSgaJpZM4abQSP .
Yesterday I checked it on Fedora 29 and it has the same issue as Ubuntu. However I don't remember what OpenMPI version Fedora 29 is using (and I wiped it as I didn't like it). Thanks for comment - will try to trace it.