openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

Changes to the code cache repository allocation code

Open mpirvu opened this issue 1 year ago • 2 comments
trafficstars

In order to eliminate helper trampolines, OpenJ9 tries to allocate the code cache in the vicinity of the JIT dll. This intention is signalled by using a preferredStartAddress when calling allocateCodeCacheSegment(). Currently, only x86-64 uses this approach. This commit implements the following changes:

  • For Linux, we increase the search space to almost 2GB. Also, we prefer to start with an approach that uses the smaps pseudofile to find memory ranges where the code cache could be allocated.
  • For Windows, we keep a smaller memory search space because smaps are not available to speed up the search process.
  • We compute a preferred alignment and pass that to the VM. For x86-64 this alignment is 2 MB, i.e. the size of large pages used by the Transparent Huge Page (THP) mechanism. The alignment is relevant only when preferredStartAddress is provided.
  • For Linux we provide a hint to the OS (with madvise) that we prefer the usage of THP for the span of the code cache repository.

mpirvu avatar May 17 '24 15:05 mpirvu

jenkins test sanity all jdk21

mpirvu avatar May 17 '24 16:05 mpirvu

jenkins test xlinux all jdk21

mpirvu avatar May 18 '24 12:05 mpirvu

I have changed the code so that chooseCacheStartAddress() picks the start address and allocateCodeCacheSegment() follows that recommendation. It's still a tight coupling in the sense that chooseCacheStartAddress() "knows" that allocateCodeCacheSegment() will use smaps to search for a gap in the address space, and therefore we can afford to search over a larger area.

mpirvu avatar May 22 '24 17:05 mpirvu

Added ASSERT_FATAL as suggested. On my machine at home I managed to configure 1GB pages. For some of the runs I get an output like:

#CODECACHE:  The code cache repository was allocated between addresses 00007F72F90CF000 and 00007F73390CF000 to avoid helper trampolines. alignment=1073741824 largeCodePageSize=1073741824
#CODECACHE:  allocated code cache segment of size 1073741824
#CODECACHE:  allocateCodeCacheRepository: size=1073741824 heapBase=00007F72F90CF000 heapAlloc=00007F72F90CF008 heapTop=00007F73390CF000
#CODECACHE:  carved size=2097144 range: 00007F72F90CF008-00007F72F92CF000
#CODECACHE:  CodeCache allocated 00007F73480CD8E0 @ 00007F72F90CF008-00007F72F92CF000 HelperBase:00007F72F92CE270

The code cache repository is not 1GB aligned and I would like to understand why.

mpirvu avatar May 23 '24 14:05 mpirvu

I have replaced the ASSERT_FATAL with an if statement because endAddress can be smaller than startAddress when the size of the codeCache repository is very large and there is no way to fit it in the vicinity of the JIT dll. If that happens, we will just let the OS pick any address it wants.

I have also tracked down the behavior with 1 GB large pages that were not aligned: when large pages are enabled the VM uses shmat rather than mmap to allocate memory. The call to addressKey = shmget(IPC_PRIVATE, (size_t) byteAmount, shmgetFlags); was failing (I needed to be root) and the VM code proceeded with allocating memory with default pages. When I run as root, the allocation with large pages succeeds and it is aligned properly.

mpirvu avatar May 24 '24 18:05 mpirvu

jenkins test sanity xlinux,win jdk21

dsouzai avatar May 24 '24 18:05 dsouzai

Looks like both builds failed due to infra issues:

Linux:

19:09:47  Error occurred for request PUT /artifactory/ci-openj9/Build_JDK21_x86-64_linux_Personal/151/test-images.tar.gz;build.parentNumber=513;build.parentName=Pipeline_Build_Test_JDK21_x86-64_linux;build.buildIdentifier=eclipse-openj9%2Fopenj9%2319516;build.timestamp=1716576887416;build.name=Build_JDK21_x86-64_linux_Personal;build.number=151 HTTP/1.1: Broken pipe (Write failed).

Windows:

15:37:34  ERROR: Cannot delete workspace :Unable to delete 'F:\Users\jenkins\workspace\Build_JDK21_x86-64_windows_Personal\openssl\NUL'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.

dsouzai avatar May 27 '24 12:05 dsouzai

jenkins test sanity xlinux,win jdk21

dsouzai avatar May 27 '24 12:05 dsouzai