Movie Wizard Crashing
User reported that the movie wizard is crashing after generating a few frames.
5TB data lives here on the RZ: /p/lustre1/justin/visit_movie_issue_data/
Presumably they were using 3.4.1.
User ran on rztrona with a single node and 36 procs.
User reports using more nodes delays when it crashes.
The question I have is why does the movie wizard presumably run out of memory after chugging through several time steps? Shouldn't each time step use roughly the same amount of memory as the last?
@JustinPrivitera two quick thoughts...
- I think we have the potential for some really big leaks in the engine. We've seen this in the past. We should review it regularly but I don't think we do. Finding leaks is not necessarily easy as there are a ton of code paths and rare cases my be difficult to find.
- As simulations evolve, their data gets "noisier". So, late time contour plots or MIR are going to be a lot bigger, maybe 10-100x bigger...especially if there are a lot of materails involved. So, same resources that work for early time views, may not work for late time views.
Perhaps this could also be caused by caching from one cycle to another.
We should schedule some time to understand the problem. Leaving unreviewed so that when @cyrush returns we can make a game plan.
look into adding memory queries in the movie wizard so we can get more info in cases like this. also check if engine cache is cleared for each new timestep
I tried to replicate this with 56 processors on one node of rzwhippet with VisIt 3.4.2. I tried the following, saving 30 frames of PNG files. They all worked fine. I'll try 36 processors on rztrona with 3.4.1.
- Pseudocolor - CELLS/mass
- Filled Boundary - MMATERIAL
- Pseudocolor - CELLS/mass on material "Asteroid boulders"
I repeated steps 1 - 3 above on rztrona with 36 processors using VisIt 3.4.1 and the movies generated fine as well. In this case I generated MPEG movies.
I also tried a pseudocolor plot of POINTS/mass to see if it was related to using the point mesh instead of the volume mesh and that worked as well.
@JustinPrivitera Do you have more information on what plots/operators the user was using?
I also tried a Pseudoolor of "POINTS/H_determinant" since I noticed it was a database defined variable that had a recenter and conn_cmfe. That worked as well.
@JustinPrivitera Do you have more information on what plots/operators the user was using?
Unfortunately I do not have any more details. All that they shared with me is in the ticket description above. I think if you are unable to reproduce that we can close this, as I have not heard from this user since opening the ticket. If they run into problems again they can let me know and I can capture more information.
Closing this issue since I'm unable to reproduce the issue.