Causes for LLVM Memory error: Unable to allocate section memory!
I am experiencing this bug while running a flow simulation, where I use numba to accelerate the solution of some subsystems of the global problem.
I was not able to pinpoint exactly when the problem occurs, and how, because it occurs after running the simulation for days and the nature of the problem makes it practically impossible to debug. The code is very extensive and I cannot tailor an example reproducing it, but I can provide an entry point: https://github.com/pmgbergen/porepy/blob/cold_co2_injection/src/porepy/examples/cold_co2_injection/run_md.py
And an entry point to the eagerly NJIT compiled framework: https://github.com/pmgbergen/porepy/blob/02b2e954d2b217a44770e778e0a295aaef9cf484/src/porepy/compositional/compiled_flash/uniflash.py#L219
If I can assist anyhow further to pinpoint the exact cause of the issue, please let me know.
For now, I would kindly ask for
- the potential causes of this error,
- Pitfalls/ possibilities for users to create memory leaks when using (eager) NJIT.
With this information I may be able to narrow down the issue further.
numba v0.62.1 llvmlite v0.45.1
If the flow of the program involves compiling more and more functions as time goes by, it will eventually run out of memory because jitted functions can never be fully freed. (I will search for a couple of references)
Similar discussions:
https://github.com/numba/numba/issues/9672 https://github.com/numba/numba/issues/570 https://github.com/numba/numba/issues/2290
@vlipovac, I have a few questions to better understand your use case:
- When LLVM fails to allocate space, how many njit functions are present in
CompiledUnifiedFlash.residuals? - How many additional njit functions does your simulation require?
Thank you both for reaching out.
I took a look at the mentioned issues, and indeed there are a few occurrences where we use njit within a scope. It will take some time to comb through it, I will come back to you in case the mentioned work-arounds are sufficient.
@sklam the residuals dict references only 1-3 functions. But those have references to > 50 other functions within the function body, all dynamically created. Once the top-level functions are created, their references are kept and passed around by the framework (and only they are kept explicitly, the nested ones created in a scope are dereferenced once the interpreter exits the factory method. I guess they stay somewhere in the memory until the program exits).
Additionally to the dynamic ones, there are probably around 100 other njitted functions in our framework, which are not dynamically created, in case the number per se is relevant.