jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8331953: ubsan: metaspaceShared.cpp:1305:57: runtime error: applying non-zero offset 12849152 to null pointer

Open tstuefe opened this issue 1 year ago • 6 comments

Reported by @MBaesken at SAP. ubsan complains about this line:

const size_t ccs_begin_offset = align_up(base_address + archive_space_size,
                                           class_space_alignment) - base_address;

base_address here is the wish address, with NULL being an explicitly allowed value that indicates "no preference". The line calculates the offset of the class space within the future combined CDS+class-space mapping. Ubsan complains about base_address being possibly NULL.

Ubsan is missing the point somewhat. The addition is not a problem, on our platforms at least.

However, it highlights a slight incorrectness (which it did not notice): base_address is the wish for the future base of the Klass range. That wish is not guaranteed to be fulfilled; the eventual start of the Klass range could be somewhere else. Therefore, calculating the class space offset with an alignment based on that wish address is wrong. It always worked in practice since base_address was always aligned to class_space_alignment (16MB).

Hence, the fix is simple: We just make the alignment requirement for the base address explicit. When running with class space, we now assert that base_address is aligned to class space alignment (as well as CDS core region alignment, but that is much smaller). Since base_address is calculated either from a hard-wired default or from the SharedBaseAddress user input, and both are ensured to be properly aligned, that assert should never fire.

Then, the offending calculation can be simplified by removing the base address from it.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Warning

 ⚠️ Found leading lowercase letter in issue title for 8331953: ubsan: metaspaceShared.cpp:1305:57: runtime error: applying non-zero offset 12849152 to null pointer

Issue

  • JDK-8331953: ubsan: metaspaceShared.cpp:1305:57: runtime error: applying non-zero offset 12849152 to null pointer (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19152/head:pull/19152
$ git checkout pull/19152

Update a local copy of the PR:
$ git checkout pull/19152
$ git pull https://git.openjdk.org/jdk.git pull/19152/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19152

View PR using the GUI difftool:
$ git pr show -t 19152

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19152.diff

Webrev

Link to Webrev Comment

tstuefe avatar May 09 '24 06:05 tstuefe

:wave: Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar May 09 '24 06:05 bridgekeeper[bot]

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331953: ubsan: metaspaceShared.cpp:1305:57: runtime error: applying non-zero offset 12849152 to null pointer

Reviewed-by: iklam, mbaesken

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 100 new commits pushed to the master branch:

  • f9f8d0b48057a02923e36c8e11286b57cc72279e: 8332101: Add an @since to StandardOperation:REMOVE in jdk.dynalink
  • f398cd225012694a586e528936159b6df7b1586c: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop
  • 96c5c3fe75103dc45bc1c3ccce0ab36303121a60: 8329998: Remove double initialization for parts of small TypeArrays in ZObjArrayAllocator
  • ee4a9d34827166ff9ac04e2375058fdc08e43194: 8321622: ClassFile.verify(byte[] bytes) throws unexpected ConstantPoolException, IAE
  • ab8d7b0cedfaae124262325cd1d4b59cef996d85: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs
  • fe8a2aff3129b515c2a0f3ab96f5e3ad6cef7b70: 8307778: com/sun/jdi/cds tests fail with jtreg's Virtual test thread factory
  • 95f79c678737fb8de9ed45c516761d4d818869ef: 8332253: Linux arm32 build fails after 8292591
  • b687aa550837830b38f0f0faa69c353b1e85219c: 8332176: Refactor ClassListParser::parse()
  • 4083255440cfbf39b9683ea88a433d71ec6111e7: 8316138: Add GlobalSign 2 TLS root certificates
  • 43b109b111e77d0f7b302debc0d76e4ac7c9ac56: 8330066: HeapDumpPath and HeapDumpGzipLevel VM options do not mention HeapDumpBeforeFullGC and HeapDumpAfterFullGC
  • ... and 90 more: https://git.openjdk.org/jdk/compare/2baacfc16916220846743c6e49a99a6c41cac510...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar May 09 '24 06:05 openjdk[bot]

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar May 09 '24 06:05 openjdk[bot]

@MBaesken , and possibly @iklam or @calvinccheung ?

tstuefe avatar May 09 '24 09:05 tstuefe

Webrevs

mlbridge[bot] avatar May 09 '24 09:05 mlbridge[bot]

I added the PR to our build+test queue .

MBaesken avatar May 10 '24 07:05 MBaesken

We see now errors like this on Linux ppc64le with the patch added.


#  Internal Error (/jdk/src/hotspot/share/cds/metaspaceShared.cpp:1295), pid=7002, tid=7006
#  assert(is_aligned(base_address, base_address_alignment)) failed: Archive base address unaligned: 0x0000000000010000, needs alignment: 16777216.
#
# JRE version:  (23.0) (fastdebug build )
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-adhoc.jenkinsi.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-ppc64le)
# Problematic frame:
# V  [libjvm.so+0x1546818]  MetaspaceShared::reserve_address_space_for_archives(FileMapInfo*, FileMapInfo*, bool, ReservedSpace&, ReservedSpace&, ReservedSpace&)+0x258

Current thread (0x00007fff8c02ca00):  JavaThread "Unknown thread" [_thread_in_vm, id=7006, stack(0x00007fff92350000,0x00007fff92550000) (2048K)]

Stack: [0x00007fff92350000,0x00007fff92550000],  sp=0x00007fff9254dd60,  free space=2039k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1546818]  MetaspaceShared::reserve_address_space_for_archives(FileMapInfo*, FileMapInfo*, bool, ReservedSpace&, ReservedSpace&, ReservedSpace&)+0x258  (metaspaceShared.cpp:1295)
V  [libjvm.so+0x154937c]  MetaspaceShared::map_archives(FileMapInfo*, FileMapInfo*, bool)+0xcc  (metaspaceShared.cpp:1050)
V  [libjvm.so+0x1549d38]  MetaspaceShared::initialize_runtime_shared_and_meta_spaces()+0x2c8  (metaspaceShared.cpp:931)
V  [libjvm.so+0x153bd68]  Metaspace::global_initialize()+0x178  (metaspace.cpp:714)
V  [libjvm.so+0x1c203d8]  universe_init()+0x168  (universe.cpp:866)
V  [libjvm.so+0xf60130]  init_globals()+0x90  (init.cpp:128)
V  [libjvm.so+0x1bd8694]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x364  (threads.cpp:553)
V  [libjvm.so+0x114c4b8]  JNI_CreateJavaVM+0x98  (jni.cpp:3581)
C  [libjli.so+0x5e94]  JavaMain+0xd4  (java.c:1550)
C  [libjli.so+0xbc38]  ThreadJavaMain+0x18  (java_md.c:653)
C  [libpthread.so.0+0xaa68]  start_thread+0x108

MBaesken avatar May 14 '24 08:05 MBaesken

We see now errors like this on Linux ppc64le with the patch added.

Thanks, Matthias, the new version should not have this problem.

@iklam, could you review again? The only new hunk came with this commit: https://github.com/openjdk/jdk/pull/19152/commits/a38441d0e8357c755a3f82a9e4db2e995b00a196 .

I now make sure that the SharedBaseAddress passed in during archive generation is correctly aligned to both archive core alignment and metaspace alignment. I thought we did this already, but apparently not.

tstuefe avatar May 15 '24 06:05 tstuefe

We see now errors like this on Linux ppc64le with the patch added.

#  Internal Error (/jdk/src/hotspot/share/cds/metaspaceShared.cpp:1295), pid=7002, tid=7006
#  assert(is_aligned(base_address, base_address_alignment)) failed: 
   Archive base address unaligned: 0x0000000000010000, needs alignment: 16777216.

This looks curious. Why would base_address be 0x0000000000010000? Is that value contained in static_mapinfo->requested_base_address()? I am wondering is there's a rounding/alignment bug that leads up to this value on ppc64le.

iklam avatar May 15 '24 17:05 iklam

We see now errors like this on Linux ppc64le with the patch added.

#  Internal Error (/jdk/src/hotspot/share/cds/metaspaceShared.cpp:1295), pid=7002, tid=7006
#  assert(is_aligned(base_address, base_address_alignment)) failed: 
   Archive base address unaligned: 0x0000000000010000, needs alignment: 16777216.

This looks curious. Why would base_address be 0x0000000000010000? Is that value contained in static_mapinfo->requested_base_address()? I am wondering is there's a rounding/alignment bug that leads up to this value on ppc64le.

No, that's the test. It dumps archives with different SharedBaseAddress, then checks that the archives can be loaded. One of the SharedBaseAddress values is 64K:

https://github.com/openjdk/jdk/blob/95f79c678737fb8de9ed45c516761d4d818869ef/test/hotspot/jtreg/runtime/cds/SharedBaseAddress.java#L76

It used to work before this patch, though subtly broken, because we did not require the archive base to be aligned to class space alignment. Instead, we aligned the class space within the mapping (so, gap between archive and class space can be higher). That, however, has the problem described in the description. We calculate the class space offset in the mapping before actually mapping, based on the SharedAddressBase given at archive creation. If the mapping fails (likely with a value aligned to 64K) and the real mapping later is aligned at a different alignment (likely with 64K), either class space start will be misaligned or it gets re-aligned but then the mapping size may be too small.

tstuefe avatar May 16 '24 05:05 tstuefe

SAP reported that all tests are green now.

@MBaesken could you give this a quick thumbs up?

tstuefe avatar May 16 '24 07:05 tstuefe

Thanks @MBaesken and @iklam !

/integrate

tstuefe avatar May 16 '24 10:05 tstuefe

Going to push as commit 910d77becd15580296687b00fed085ab106cb2eb. Since your change was applied there have been 100 commits pushed to the master branch:

  • f9f8d0b48057a02923e36c8e11286b57cc72279e: 8332101: Add an @since to StandardOperation:REMOVE in jdk.dynalink
  • f398cd225012694a586e528936159b6df7b1586c: 8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop
  • 96c5c3fe75103dc45bc1c3ccce0ab36303121a60: 8329998: Remove double initialization for parts of small TypeArrays in ZObjArrayAllocator
  • ee4a9d34827166ff9ac04e2375058fdc08e43194: 8321622: ClassFile.verify(byte[] bytes) throws unexpected ConstantPoolException, IAE
  • ab8d7b0cedfaae124262325cd1d4b59cef996d85: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs
  • fe8a2aff3129b515c2a0f3ab96f5e3ad6cef7b70: 8307778: com/sun/jdi/cds tests fail with jtreg's Virtual test thread factory
  • 95f79c678737fb8de9ed45c516761d4d818869ef: 8332253: Linux arm32 build fails after 8292591
  • b687aa550837830b38f0f0faa69c353b1e85219c: 8332176: Refactor ClassListParser::parse()
  • 4083255440cfbf39b9683ea88a433d71ec6111e7: 8316138: Add GlobalSign 2 TLS root certificates
  • 43b109b111e77d0f7b302debc0d76e4ac7c9ac56: 8330066: HeapDumpPath and HeapDumpGzipLevel VM options do not mention HeapDumpBeforeFullGC and HeapDumpAfterFullGC
  • ... and 90 more: https://git.openjdk.org/jdk/compare/2baacfc16916220846743c6e49a99a6c41cac510...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar May 16 '24 10:05 openjdk[bot]

@tstuefe Pushed as commit 910d77becd15580296687b00fed085ab106cb2eb.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar May 16 '24 10:05 openjdk[bot]