jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8342040: Further improve entry lookup performance for multi-release JARs

Open eirbjo opened this issue 1 year ago • 12 comments

Please review this PR which speeds up JarFile::getEntry lookup significantly for multi-release JAR files.

The changes in this PR are motivated by the following insights:

  • META-INF/versions/ is sparsely populated.
  • Most entries are not versioned
  • The number of unique versions for each versioned entry is small
  • Many JAR files are 'accidentally' multi-release; they use the feature to hide module-info.class from Java 8.

Instead of performing one lookup for every version identified in the JAR, this PR narrows the version search down to an approximation of the number of versions found for the entry being looked up, which will often be zero. This speeds up lookup for non-versioned entries, and provides a more targeted search for versioned entries.

An alternative approach could be to normalize the hash code to use the none-versioned name such that versioned and non-versioned names would be resolved in the same lookup. This was quickly abandoned since the code changes were intrusive and mixed too many JAR specific concerns into ZipFile.

Testing: The existing JarFileGetEntry benchmark is updated to optionally test a multi-release JAR file with one versioned entry for module-info.class plus two other versioned class files for two distinct versions. Performance results in first comment.

Running ZipFileOpen on a multi-release JAR did not show a significat difference between this PR and mainline.

The JAR and ZIP tests are run locally. GHA results green. The noreg-perf label is added in JBS.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8342040: Further improve entry lookup performance for multi-release JARs (Enhancement - P4)

Reviewers

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21489/head:pull/21489
$ git checkout pull/21489

Update a local copy of the PR:
$ git checkout pull/21489
$ git pull https://git.openjdk.org/jdk.git pull/21489/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21489

View PR using the GUI difftool:
$ git pr show -t 21489

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21489.diff

Webrev

Link to Webrev Comment

eirbjo avatar Oct 14 '24 11:10 eirbjo

:wave: Welcome back eirbjo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 14 '24 11:10 bridgekeeper[bot]

@eirbjo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342040: Further improve entry lookup performance for multi-release JARs

Co-authored-by: Claes Redestad <[email protected]>
Reviewed-by: redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 149 new commits pushed to the master branch:

  • 28252bb48da3c960a1a261af26650d74235a9531: 8341444: Unnecessary check for JSRs in CDS
  • 8174cbd5cb797a80d48246a686897ef6fe64ed57: 8341978: Improve JButton/bug4490179.java
  • 9201e9fcc28cff37cf9996e8db38f9aee7511b1c: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR
  • 0963b9e8918670badc956a325fe2ca0bf55f6d29: 8341664: ReferenceClassDescImpl cache internalName
  • c51a086ce32dd4e97aa83dfba3bcf9b0636193cc: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type
  • 7f4ed5001efac28f02fbbb78893051e28cc33e80: 8341020: Error handler crashes when Metaspace is not fully initialized
  • f50bd0d9ec65a6b9596805d0131aaefc1bb913f3: 8341513: Remove the unused thread_type field from OSThread
  • 7a16906ed0dce716bc9516cb75b6450725fe9dbb: 8341134: Deprecate for removal the jrunscript tool
  • ffe60919df59196d65832b8ce6b2cd38099d64df: 8173970: jar tool should have a way to extract to a directory
  • 2b03dbdac4819bc0d40912f273a1ca7ab4e8715e: 8311530: Deprecate jdk.jsobject module for removal
  • ... and 139 more: https://git.openjdk.org/jdk/compare/3fba1702cd8dc817b11bfa51077c41424d289281...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Oct 14 '24 11:10 openjdk[bot]

@eirbjo The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 14 '24 11:10 openjdk[bot]

Performance results:

Baseline:

Benchmark                              (mr)  (size)  Mode  Cnt   Score   Error  Units
JarFileGetEntry.getEntryHit           false    1024  avgt   15   64.619 ? 6.633  ns/op
JarFileGetEntry.getEntryHit            true    1024  avgt   15  301.770 ? 4.819  ns/op
JarFileGetEntry.getEntryHitUncached   false    1024  avgt   15   82.030 ? 2.057  ns/op
JarFileGetEntry.getEntryHitUncached    true    1024  avgt   15  327.966 ? 3.092  ns/op
JarFileGetEntry.getEntryMiss          false    1024  avgt   15   27.937 ? 0.982  ns/op
JarFileGetEntry.getEntryMiss           true    1024  avgt   15  267.280 ? 2.196  ns/op
JarFileGetEntry.getEntryMissUncached  false    1024  avgt   15   53.214 ? 1.085  ns/op
JarFileGetEntry.getEntryMissUncached   true    1024  avgt   15  290.904 ? 2.359  ns/op

PR:

Benchmark                              (mr)  (size)  Mode  Cnt   Score   Error  Units
JarFileGetEntry.getEntryHit           false    1024  avgt   15  63.311 ? 3.014  ns/op
JarFileGetEntry.getEntryHit            true    1024  avgt   15  71.333 ? 1.510  ns/op
JarFileGetEntry.getEntryHitUncached   false    1024  avgt   15  80.401 ? 0.516  ns/op
JarFileGetEntry.getEntryHitUncached    true    1024  avgt   15  94.810 ? 1.136  ns/op
JarFileGetEntry.getEntryMiss          false    1024  avgt   15  26.717 ? 0.276  ns/op
JarFileGetEntry.getEntryMiss           true    1024  avgt   15  45.677 ? 0.412  ns/op
JarFileGetEntry.getEntryMissUncached  false    1024  avgt   15  53.122 ? 0.954  ns/op
JarFileGetEntry.getEntryMissUncached   true    1024  avgt   15  66.583 ? 0.425  ns/op

eirbjo avatar Oct 14 '24 11:10 eirbjo

Webrevs

mlbridge[bot] avatar Oct 14 '24 14:10 mlbridge[bot]

Marking this PR as draft while investigating alternatives as proposed by @cl4es

eirbjo avatar Oct 18 '24 09:10 eirbjo

⚠️ @eirbjo This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

openjdk[bot] avatar Oct 18 '24 10:10 openjdk[bot]

/contributor add cl4es

eirbjo avatar Oct 18 '24 12:10 eirbjo

@eirbjo cl4es was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <[email protected]>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

openjdk[bot] avatar Oct 18 '24 12:10 openjdk[bot]

/contributor add @cl4es

eirbjo avatar Oct 18 '24 12:10 eirbjo

@eirbjo Contributor Claes Redestad <[email protected]> successfully added.

openjdk[bot] avatar Oct 18 '24 12:10 openjdk[bot]

Bringing this back from draft mode after some feedback and exploration with @cl4es

The current version of this PR now uses a BitSet to represent versions. This speeds up version tracking in initCEN for the unusual case that there is a large number of versioned entries in the ZIP.

To summarize the performance benefits of this PR over baseline:

  • The lookup overhead of multi-release with a few versioned entries is reduced from 5X to 1.2X
  • For missing entries (very common in URLClassPath!), the overhead is reduced from 10X to 1.8X

Claes has approved this PR already, but since he contributed code and ideas to this PR, I'll let this linger until early next week to allow non-Scandinavian reviewers to have a look :-)

eirbjo avatar Oct 18 '24 15:10 eirbjo

/integrate

eirbjo avatar Oct 28 '24 18:10 eirbjo

Going to push as commit d49f21043b84ebcc8b9176de3a84621ca7bca8fb. Since your change was applied there have been 267 commits pushed to the master branch:

  • d2e716eb72ea603fce50f0757a766ec623ef2faf: 8331958: Update PC/SC Lite for Suse Linux to 2.3.0
  • a95374f588149d80068275a496ba4aa04b3bb4fd: 8343101: Rework BasicTest.testTemp test cases
  • 00fe9f7bdfd245791bca6b5b1b2d0a98d41af221: 8343100: Consolidate EmptyFolderTest and EmptyFolderPackageTest jpackage tests into single java file
  • 9f6d5b46ce2cfcdb39f94b8ac8621ee21f4e8740: 8343020: (fs) Add support for SecureDirectoryStream on macOS
  • 1341b81321fe77005ba68fba19c7d83e3fcb5fde: 8341666: FileInputStream doesn't support readAllBytes() or readNBytes(int) on pseudo devices
  • 52382e285fdf853c01605f8e0d7f3f5d34965802: 8338021: Support new unsigned and saturating vector operators in VectorAPI
  • e659d9da5d6198ad9c85efd6472e138a6a3961c2: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators()
  • 9f6211bcf1b46e4bfba2d128d9eb8457bc0cde51: 8341371: CDS cannot load archived heap objects with -XX:+UseSerialGC -XX:-UseCompressedOops
  • 120a9357b3cf63427a6c8539128b69b11b9beca3: 8342561: Metaspace for generated reflection classes is no longer needed
  • d5fb6b4a3cf4926acb333e7ee55f96fc76225631: 8339939: [JVMCI] Don't compress abstract and interface Klasses
  • ... and 257 more: https://git.openjdk.org/jdk/compare/3fba1702cd8dc817b11bfa51077c41424d289281...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Oct 28 '24 18:10 openjdk[bot]

@eirbjo Pushed as commit d49f21043b84ebcc8b9176de3a84621ca7bca8fb.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Oct 28 '24 18:10 openjdk[bot]