devito icon indicating copy to clipboard operation
devito copied to clipboard

Temporary increase in memory when executed pyrevolve.Operator leads to memory error

Open ofmla opened this issue 3 years ago • 7 comments

A python script running a TTI RTM for only one shot with pyrevolve (https://sesibahia-my.sharepoint.com/:u:/g/personal/oscar_ladino_fieb_org_br/EWpX_VT4U3lCqdAArLVaGKABe9oysSb0KKRDlqIyL1XpwA?e=TgKdL4) runs fine for a period of time before crashing with the following error:

  File "/home/oscarm/.conda/envs/devito-v4.2.2/lib/python3.8/site-packages/pytools/prefork.py", line 49, in call_capture_output
    popen = Popen(cmdline, cwd=cwd, stdin=PIPE, stdout=PIPE,
  File "/home/oscarm/.conda/envs/devito-v4.2.2/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/oscarm/.conda/envs/devito-v4.2.2/lib/python3.8/subprocess.py", line 1637, in _execute_child
    self.pid = _posixsubprocess.fork_exec(
OSError: [Errno 12] Cannot allocate memory

It seems to be that forking to call the compiler when applying the pyrevolve.Operator temporarily doubles the memory of the parent process. If the parent process is using more than half of the system memory, it will lead to run into a memory error. A workaround is to precompile the code before instantiate the CheckpointOperator object, i.e

cp = DevitoCheckpoint([u,v])
op_fwd = solver.op_fwd(save=False)
op_fwd.cfunction
op_imaging.cfunction
wrap_fw = CheckpointOperator(op_fwd, src=geometry.src, u=u, v=v, ... )
wrap_rev = CheckpointOperator(op_imaging, u=u, v=v, ... )

However, it would appear that there is another possible solution as suggested in this thread https://devitocodes.slack.com/archives/C7JMLMSG0/p1593739206410500?thread_ts=1593727821.408500&cid=C7JMLMSG0

ofmla avatar Jul 06 '20 15:07 ofmla

I think the current evidence points at the fork before compilation being the culprit.

navjotk avatar Aug 05 '20 09:08 navjotk

@mloubout @tjb900 ever noticed anything like this?

FabioLuporini avatar Apr 13 '21 10:04 FabioLuporini

Yeah, definitely

It's worth noting that the physical memory usage doesn't increase as both processes refer to the same physical memory, but if there is anything in place that limits the total virtual memory footprint of a group of processes or all processes on the system, this kind of thing can trip it up (e.g. the value of /proc/sys/vm/overcommit_ratio).

Possible workarounds are:

  • change whatever is using fork/exec to use posix_spawn instead. In Python >= 3.8, Popen and friends (which is what is used by codepy, indirectly via pytools) will use posix_spawn in some circumstances (see https://docs.python.org/3/whatsnew/3.8.html). So it might just be as simple as updating to Python 3.8. That would be the easiest thing to try here.
  • (if you control the system) tweak whatever is causing the additional virtual memory footprint to cause a problem. As I mention above, there is very little additional physical memory consumed by the fork, so it's a policy issue not a resource one.
  • compile early before large allocations have been made, thus avoiding forking later in the execution. Note that fork can cause various troubles with some MPI implementations as well.
  • use madvise(MADV_DONTFORK) on large allocations so that these virtual memory areas are not present in the new child after fork(). This could be a feature of the devito allocators, perhaps?

Finally, in answering this I have noticed that codepy uses pytools.prefork to actually spawn the compiler, and this module is explicitly designed to avoid some of the above issues. It does this by supporting forking a "fork server" early in the process before e.g. MPI is initialised or large memory allocations occur. And then the compiler processes are forked from that tiny process rather than the application process itself.

See https://github.com/inducer/pytools/blob/main/pytools/prefork.py

It looks to me like the intention of the above module is that calling pytools.prefork.enable_prefork() very early in your application might sidestep some of the above issues quite neatly.

tjb900 avatar Apr 14 '21 01:04 tjb900

That's really a nice comprehensive answer. Thanks a lot @tjb900 .

FabioLuporini avatar Apr 14 '21 06:04 FabioLuporini

We have been seeing the same memory errors in Stride when compiling certain operators. After doing some tests, it seems calling pytools.prefork.enable_prefork() early solves the compilation problem.

However, the problem persists if the compiler or the MPI configuration is changed when memory use is high. That is because in these cases Devito uses subprocess.check_output to sniff the available compilers, which calls subprocess.Popen directly instead of using pytools.

ccuetom avatar Jun 14 '21 08:06 ccuetom

Yeah, I've just been running into this lately as well. Seems like it might be a good idea to switch the compiler/mpi/gpu sniffs to use pytools.

Note that along the same lines, there is also an issue with the allocators initializing - ctypes.util.find_library uses subprocess to do some pretty hacky stuff. That will be harder to fix, but I guess applications that run into this problem can manually initialise the allocators early.

tjb900 avatar Jun 15 '21 01:06 tjb900

these are memoized now, is it still an issue?

FabioLuporini avatar Aug 21 '23 08:08 FabioLuporini