BOUT-dev icon indicating copy to clipboard operation
BOUT-dev copied to clipboard

OOM with largeish grids

Open dschwoerer opened this issue 1 year ago • 2 comments

If running with a not-very-small grid, BOUT++ fails, even if a sufficient number of nodes is used. @bendudson suggested that the grid is read in total, and only then sliced. This is happening on a 3.1 GB grid file, on a 72 core machine with 256 GB of ram (3.55 GB/thread) A 2.4 GB file still works.

	Option ZMAX = 1 (default)
	EQUILIBRIUM IS SINGLE NULL (SND) 
Connection between top of Y processor 8 and bottom of 0 in range 0 <= x < 725
=> This processor sending in down
WARNING adding connection: poloidal index -1 out of range
	MYPE_IN_CORE = true
	DXS = 20, DIN = 360. DOUT = -1
	UXS = 0, UIN = -1. UOUT = 45
	XIN = -1, XOUT = 1
	Twist-shift: DI 
	Option twistshift = 0 (default)

< here the OOM happens >

Possible boundary regions are: core, sol, Boundary regions in this processor: core, 
Constructing default regions
	Boundary region inner X
	Option mesh:extrapolate_x = 0 (default)
	Option mesh:extrapolate_y = false (/ptmp/dave/hermes-2/7-timeing.c171//BOUT.inp)
	Option dx = Tensor<BoutReal> (v17/W7X-conf0-724x72x192.emc3.inner:f.vessel:f.island:f.mfi:15e-1.offt:4.fci.nc)
	Option dy = Tensor<BoutReal> (v17/W7X-conf0-724x72x192.emc3.inner:f.vessel:f.island:f.mfi:15e-1.offt:4.fci.nc)
	Option ZMIN = 0 (default)
	Option ZMAX = 1 (default)
	Option dz = Tensor<BoutReal> (v17/W7X-conf0-724x72x192.emc3.inner:f.vessel:f.island:f.mfi:15e-1.offt:4.fci.nc)
	Option mesh:paralleltransform:type = fci (/ptmp/dave/hermes-2/7-timeing.c171//BOUT.inp)
	Option mesh:paralleltransform:z_periodic = 1 (default)
	Option parallel_transform =  (default)

(output is from the smaller grid, to have an idea what is going on)

That looks a lot like the whole grid is read into memory, including the full domain as well as all variables.

dschwoerer avatar Apr 11 '24 09:04 dschwoerer

Is there a way to implement lazy loading? @ZedThree The grids for w7x are quite large at this point, we can easily approach 40GB per grid file.

totork avatar Aug 05 '24 08:08 totork

Here is an attempt: https://github.com/boutproject/BOUT-dev/tree/lazy-grid-loading

dschwoerer avatar Aug 05 '24 08:08 dschwoerer