llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

Implement reserveAllocationSpace for SectionMemoryManager

Open MikaelSmith opened this issue 2 years ago • 11 comments

Implements reserveAllocationSpace and provides an option to enable needsToReserveAllocationSpace for large-memory environments with AArch64.

The AArch64 ABI has restrictions on the distance between TEXT and GOT sections as the instructions to reference them are limited to 2 or 4GB. Allocating sections in multiple blocks can result in distances greater than that on systems with lots of memory. In those environments several projects using SectionMemoryManager with MCJIT have run across assertion failures for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address across distances greater than 2GB (an int32).

Fixes #71963 by allocating all sections in a single contiguous memory allocation, limiting the distance required for instruction offsets similar to how pre-compiled binaries would be loaded into memory.

MikaelSmith avatar Nov 10 '23 18:11 MikaelSmith

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

github-actions[bot] avatar Nov 10 '23 18:11 github-actions[bot]

Also, I should have said earlier: many thanks for all your efforts and the trouble you've gone to to put this together - it is certainly looking promising, from the perspective of Numba on AArch64 platforms!

gmarkall avatar Nov 16 '23 17:11 gmarkall

TODO: add tests to MCJITMemoryManagerTest

This is done now, I think? Or are you planning to add more tests?

gmarkall avatar Nov 17 '23 13:11 gmarkall

Now that numba/llvmlite#1009 (which essentially implements the same changes made here, just inside llvmlite) is merged and an RC has been produced, I've received various reports that this fixes issues related to #71963 on both macOS and LinuxAArch64 systems, and no reports of adverse effects - so FWIW, my confidence in this patch being a good fix is quite high.

gmarkall avatar Jan 02 '24 11:01 gmarkall

Hi - is this PR still in work? The changes in this PR helped to resolve a bug that we saw in CUDA-Q when using ARM, so I was curious if this had a path forward for getting merged into main. (However, we did have to change the default value of ReserveAlloc from false to true in order for it to fix our problem.)

bmhowe23 avatar Jun 27 '24 12:06 bmhowe23

Just to add, we've been using this implementation in Numba / llvmlite for a few months now:

  • It appears to solve the ARM relocation overflow error
  • We haven't seen any new issues introduced by it

So as far as I can tell, the implementation here seems quite robust.

gmarkall avatar Jun 27 '24 13:06 gmarkall

To add an additional impacted project, I had this issue happening on PostgreSQL instances. I've written more details about the impact in a message to the pgsql-hackers mailing list.

I've also tested the patched SectionMemoryManager on an impacted database and it fixed the segfaults.

bonnefoa avatar Aug 27 '24 13:08 bonnefoa

@gmarkall's comment sounds promising:

I've received various reports that this fixes issues related to https://github.com/llvm/llvm-project/issues/71963 on both macOS and LinuxAArch64 systems, and no reports of adverse effects - so FWIW, my confidence in this patch being a good fix is quite high.

On that basis I'm ok with this landing in main. However, if we see any issues related to it the bias should be to revert, rather than try to fix issues (if the fixes would introduce further changes in behavior): We're discussing deprecation and removal of MCJIT and RuntimeDyld (see https://discourse.llvm.org/t/rfc-add-deprecation-warnings-to-mcjit-and-runtimedyld/80465 and https://discourse.llvm.org/t/rfc-removing-mcjit-and-runtimedyld/80464) and the goal at this point is stability rather than bug-fixes.

If any of you are using MCJIT directly you'll want to look at switching to ORC / LLJIT. If you're already on ORC / LLJIT please be aware that we'll be switching to JITLink by default for jit-linking. JITLink's default memory manager already addresses the out-of-range issues, so hopefully they just won't come up after the switch.

lhames avatar Aug 29 '24 05:08 lhames

I have encountered https://github.com/llvm/llvm-project/issues/71963 in the context of the MLIR ExecutionEngine. This PR fixes the bug, is there a reason why this PR was not merged in main?

castigli avatar Dec 16 '25 09:12 castigli

I didn't see any further feedback to address. Is this waiting on me somehow?

MikaelSmith avatar Dec 16 '25 19:12 MikaelSmith

I'd add that we've continued to use this with no issues reported in Numba / llvmlite.

gmarkall avatar Dec 17 '25 10:12 gmarkall

I didn't see any further feedback to address. Is this waiting on me somehow?

Nope -- this is just me getting swamped and losing track of this. Thanks for the ping!

I'll merge this, but take the opportunity to remind everyone: MCJIT is going away eventually -- you should move to ORC if you can, and file LLVM issues if you can't.

lhames avatar Dec 18 '25 02:12 lhames