parcels icon indicating copy to clipboard operation
parcels copied to clipboard

memory leak in large runs

Open JamiePringle opened this issue 2 years ago • 1 comments

I am making some large runs, with a total of 664,798,647 particles, of which roughly 1/7th are active at any one time. (Curious why? Check out https://github.com/JamiePringle/EZfate ). Every 12 times steps, so particles are created (with repeatdt) and some die by calling particle.delete() after they reach a maximum age.

Unfortunately, over time the memory usage of code keeps increasing. The memory leaked is not actively used, so I can get the run to completion by just adding 100Gb of swap space (the servers have either 128 or 256 Gb of memory). As time goes on, all the available RAM is used, and then data is swapped out. Because the swapped data is never retrieved (verified with vmstat), the server does not thrash swap.

However, because all the RAM gets consumed, there is less disk cache. This has a considerable negative impact on the speed of the particle tracking. In the attached plot, look at the bottom panel, which is time it is taking to run the drifters as a function of time the code has been running. You can see a step increase in run time at chunk 400, when the free RAM is nearly exhausted and data starts moving to swap. This was on a machine using mdadm RAID, on my ZFS systems, which are more dependent on cache, the effect is worse.

I would be happy to hear any suggestions on how one could diagnose the cause of the memory leak, or suggestion to fix it. My current fix is to split my runs in two, which just adds to the book keeping I need to do and the code-complexity of the subsequent analysis code.

Thanks, Jamie jnker_332

JamiePringle avatar Feb 15 '23 20:02 JamiePringle