dymos icon indicating copy to clipboard operation
dymos copied to clipboard

Potential memory use issue

Open JustinSGray opened this issue 3 years ago • 3 comments

Summary of Issue

User reported process termination due to high memory usage from a large, but not massive model. running on a machine with 8 gigs of memory. One other use also had issues running on 8 gigs of memory..

It may not be fixable, but worth looking into

Issue Type

  • [ ] Bug
  • [ ] Enhancement
  • [ ] Docs
  • [x] Miscellaneous

Description

User gave the following info (could not share full model):

Linux VM with 8 gigs of ram

============== Problem Summary ============
Groups:              34
Components:         194
Max tree depth:       6
Design variables:           57   Total size:     2123
Nonlinear Constraints:      81   Total size:     2266
    equality:               76                   1864
    inequality:              5                    402
Linear Constraints:          2   Total size:        2
    equality:                0                      0
    inequality:              2                      2
Objectives:                  1   Total size:        1
Input variables:          1380   Total size:    79939
Output variables:         1019   Total size:    64222
Total connections: 1380   Total transfer data size: 79939
Driver type: pyOptSparseDriver
Linear Solvers: [LinearRunOnce x 18, DirectSolver x 16]
Nonlinear Solvers: [NonlinearRunOnce x 24, NewtonSolver x 10]

Thats a fairly large opt problem, but not absurdly so. It may be unique because it was combined with a pretty large actual models (note size of input/output vectors).

could probably simulate this by just duplicating some output state into an additional output array a bunch of times, then adding that to time series.

Its worth doing some memory profiling on a simulated use case like this, with SLSQP, SNOPT, and IPOPT drivers. Maybe some obvious low hanging fruit will show up. Maybe not... worth checking though.

JustinSGray avatar Jun 14 '21 21:06 JustinSGray

the memory usage might be related to total coloring. That many constraints might be using a lot of memory. Total coloring is still coded as a dense operation I think. Its possible this is an OM memory usage issue...

JustinSGray avatar Jun 14 '21 21:06 JustinSGray

More information. User was able to switch to a larger memory machine and gave this data:

Ok, ran that same coloring case on my home computer which has 48Gb mem.
Full total jacobian was computed 3 times, taking 479.575982 seconds.
Total jacobian shape: (2267, 2117)
Jacobian shape: (2267, 2117)  (29.37% nonzero)
FWD solves: 1209   REV solves: 0
Total colors vs. total size: 1209 vs 2117  (42.9% improvement)
Sparsity computed using tolerance: 1e-25
Time to compute sparsity: 479.575982 sec.
Time to compute coloring: 559.167794 sec.

Memory peaked at 58.5% on top. Interestingly, it was running at 1200% processor and about 4% memory, 
(parallel on 12 cores I assume) then it dropped to 100% (single core) and the memory started gradually 
increasing. It cycled between 24 and 58 % 3 times (if I counted correctly) before settling in at 1200%cpu 
and 23% mem.

Note: default num_full_jacs is 3. So this data indicates that the it might be the computing of the pseudo-inverse thats taking up a lot of memory. We might need to investigate moving to a column based sparsity approach similar to the partials to lower the memory cost?

JustinSGray avatar Jun 15 '21 11:06 JustinSGray

Curious if the user was using a version after OpenMDAO/OpenMDAO#2116. That commit in general significantly reduced the memory footprint during total coloring and improved performance.

robfalck avatar Dec 20 '21 18:12 robfalck