yt
yt copied to clipboard
Projecting 1024^3 Enzo runs out of memory.
Bug report
Bug summary
Trying to project in parallel causes an out-of-memory error. The test is a 1024^3 unigrid Enzo simulation, run on 4096 cores/tiles, adiabatic MHD with Dedner and random forcing. The dataset has 16 fields, so one snapshot is 129Gb.
Code for reproduction
% ibrun -np 2 python parallel_test.py
from mpi4py import MPI
import yt
yt.enable_parallelism()
ds = yt.load("/scratch/00369/tg456484/Paper49d_moresims/ze01_M10_MA1_1024_quan/DD0095/data0095")
p = yt.ProjectionPlot(ds, "x", "density")
p.save()
The parameter file in equivalent to run/MHD/3D/StochasticForcing/StochasticForcing.enzo
with DualEnergy turned off.
Actual outcome
A representative sample of output is here, I'll edit to add a paste of the whole stack trace once I've properly captured it.
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/visualization/plot_window.py", line 1480, in
__init__
max_level=max_level)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 270, in __init__
self.get_data(field)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 334, in get_data
self._initialize_chunk(chunk, tree)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 401, in _initialize_chunk
icoords = chunk.icoords
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/data_containers.py", line 1555,
in icoords
return self._current_chunk.icoords
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/geometry/geometry_handler.py", line 271, in c
ached_func
tr = func(self)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/geometry/geometry_handler.py", line 332, in i
coords
ci = np.empty((self.data_size, 3), dtype='int64')
P006 yt : [ERROR ] 2019-02-22 13:03:36,145 MemoryError:
File "parallel_test.py", line 13, in <module>
p = yt.ProjectionPlot(ds, "x", "density")
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/visualization/plot_window.py", line 1480, in
__init__
max_level=max_level)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 270, in __init__
self.get_data(field)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 334, in get_data
self._initialize_chunk(chunk, tree)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/construction_data_containers.py"
, line 401, in _initialize_chunk
icoords = chunk.icoords
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/data_objects/data_containers.py", line 1555,
in icoords
return self._current_chunk.icoords
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/geometry/geometry_handler.py", line 271, in c
ached_func
tr = func(self)
File "/home1/00369/tg456484/local-yt-2019-02-22-py3/yt-conda/lib/python3.7/site-packages/yt/geometry/geometry_handler.py", line 332, in i
coords
ci = np.empty((self.data_size, 3), dtype='int64')
P007 yt : [ERROR ] 2019-02-22 13:03:36,145 MemoryError:
P002 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 2.
P004 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 4.
P000 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 0.
P001 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 1.
P003 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 3.
P005 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 5.
P006 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 6.
P007 yt : [ERROR ] 2019-02-22 13:03:36,200 Error occured on rank 7.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Expected outcome
The expectation is a projection plot, but faster.
Version Information
This was run on Stampede2, on one or two idev nodes. Python 3.7.2, gcc7.3.0, yt3.5.0. The full install was done on 2019-02-22.
I believe @atmyers ran into a similar issue at one point and might have thoughts.
I found that turning off mask caching helped with memory usage:
index 966f0a068..e826e4f78 100644
--- a/yt/data_objects/grid_patch.py
+++ b/yt/data_objects/grid_patch.py
@@ -42,7 +42,7 @@ class AMRGridPatch(YTSelectionContainer):
_num_ghost_zones = 0
_grids = None
_id_offset = 1
- _cache_mask = True
+ _cache_mask = False
_type_name = 'grid'
_skip_add = True
That was successful, thanks! d.
On Fri, Feb 22, 2019 at 5:10 PM Andrew Myers [email protected] wrote:
I found that turning off mask caching helped with memory usage:
` diff --git a/yt/data_objects/grid_patch.py b/yt/data_objects/grid_patch.py index 966f0a068..e826e4f78 100644 --- a/yt/data_objects/grid_patch.py +++ b/yt/data_objects/grid_patch.py @@ -42,7 +42,7 @@ class AMRGridPatch(YTSelectionContainer): _num_ghost_zones = 0 _grids = None _id_offset = 1
_cache_mask = True
_cache_mask = False
_type_name = 'grid' _skip_add = True
`
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yt-project/yt/issues/2154#issuecomment-466565541, or mute the thread https://github.com/notifications/unsubscribe-auth/AbyDmN8VjaXWQtZGvf7aC4KELlq2kqQsks5vQGq_gaJpZM4bLJHA .
-- -- Sent from a computer.
If this is to do with masks
as in boolean masks then it might be useful to consider numpy.packbits
to reduce the size of them - I came across this function semi-recently
@AshKelly Ah, yeah, that's a good idea. That would indeed reduce it. In yt-4.0 we have EWAH bitmaps, which do some really neat encoding of bitmaps and packing to speed it up, but either project might take a bit.
I suspect, but am not 100% sure, that the biggest memory factor is in the ires
and icoords
info. One way to simplify this would be to change how the projection manages chunks, and allow it to have a "special case" where the coords are generated by some functional form, so the coords did not all have to be generated.
I think we should leave this issue open, or possibly open one explicitly about reducing memory overhead, as it is a real problem -- and one that only showed up when we made things more general!
IIRC, the mask was getting allocated to cover the entire domain on every process, not just for the subset of the domain that the process owned. Even with packing, this will blow out memory eventually for large, unigrid datasets.
@atmyers Ahhh, I see. OK. I think this is something that will need to be addressed -- but we need to figure out how desperate the need is, and if it is blocked by anything else.