logging-log4j2 icon indicating copy to clipboard operation
logging-log4j2 copied to clipboard

Fix sizing of Maps

Open kilink opened this issue 5 months ago • 6 comments

Fix several instances where Maps were sized incorrectly when the number of elements to insert is known; specifying the capacity without taking into account the load factor meant it would always result in a rehash / resize operation.

To this end, add a simple Maps utility class that handles the capacity calculation, much like the static HashMap.newHashMap helper that was added in JDK19.

Checklist

Before we can review and merge your changes, please go through the checklist below. If you're still working on some items, feel free to submit your pull request as a draft—our CI will help guide you through the remaining steps.

✅ Required checks

🧪 Tests (select one)

  • [ ] I have added or updated tests to cover my changes.
  • [X] No additional tests are needed for this change.

📝 Changelog (select one)

  • [ ] I added a changelog entry in src/changelog/.2.x.x. (See Changelog Entry File Guide).
  • [X] This is a trivial change and does not require a changelog entry.

kilink avatar Aug 01 '25 17:08 kilink

Thanks so much for the patch. Would you mind sharing some context, please? Did you observe a particular performance issue?

Message ID: @.***>

vy avatar Aug 02 '25 14:08 vy

Did you observe a particular performance issue?

I don't recall how I stumbled upon it or if it showed up in any performance profiles (it definitely has for other libraries). It's just a common thing I've noticed across various projects (e.g., spring and grpc-java) since the HashMap constructor is a foot gun and it's not easy to do correctly until Java 19. Since the intent here was to presize the map, and we know how many elements to insert, it just seemed to make sense to size it so we always avoid the resize / rehash operations.

kilink avatar Aug 05 '25 17:08 kilink

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 :x: Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 :x: Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 :x: Build Scan PUBLISHED

Generated by gradle/develocity-actions

github-actions[bot] avatar Aug 06 '25 08:08 github-actions[bot]

Hi @kilink,

Adding a new utility class to log4j-api and then using it in log4j-core reinforces an unnatural dependency between the two modules, and would introduce a hard incompatibility between Log4j Core 2.26.0 and Log4j API 2.25.x. The Core module should implement the API, not depend on it for internal utilities. I realize this kind of coupling already exists in some places, but I’d prefer not to increase it.

Could you instead:

  • Move Maps to o.a.l.l.util.internal to make it clear it’s not intended for external use. While o.a.l.l.util is theoretically internal, it remains exported due to past API leaks (e.g., o.a.l.l.util.Supplier).
  • Create a separate o.a.l.l.core.util.internal.Maps helper for log4j-core, so it can be used independently without introducing additional cross-module dependencies.

I agree with @vy that, without clear benchmark results, the value of this enhancement is questionable. We avoid using HashMap in performance-critical paths (such as the logging context map) because its instantiation and operation costs are high for small maps. Instead, we use o.a.l.l.util.StringMap, which offers enhanced features like mutability control (freeze()) and a foreach method (introduced for Map only in JDK 8). However, StringMap represents a significant technical debt we would like to eliminate. If you are interested in map performance, could you explore the feasibility of replacing StringMap with Map in version 3.x?

ppkarwasz avatar Aug 06 '25 09:08 ppkarwasz

I'm against enhancements without convincing figures obtained from benchmarks mapping to real-world use cases. If this optimization yields a tangible benefit on the hot path, provide a reproducer please.

vy avatar Aug 06 '25 10:08 vy

I personally don't think it's worthwhile to spend time writing a benchmark for what amounts to a bug fix / correctness issue. If you'd rather not accept the change, feel free to close the issue. I'll leave my rationale for fixing it however:

  • Without a benchmark, we can see that the incorrect supplied capacity will cause a resize / rehash, and result in a map that is over-sized for the number of items inserted. It's easy to avoid with a trivial calculation which takes into account the load factor.
  • It's true that not all instances of this I changed in the PR are likely to be on the hot path, but fixing it everywhere is more of a consistency thing; if the helper is there, why not, otherwise it would just be more confusing.
  • There are likely callers in the wild that at least would be hitting the toMap implementation of the ReadOnlyStringMap, e.g. likely through ThreadContextMap (I can confirm such instances internally, as well as in OSS projects such as Sentry).
  • Whoever wrote those lines of code was attempting to size the HashMaps precisely by passing in a capacity, it just so happens that the number of items to insert != the map capacity, so seems like an easy mistake to have made. I assume care was taken here though since log4j generally tries to be optimal in terms of allocations and performance.

I assume the main point of contention is the addition of a utility class, and I agree it is ugly, but it's a stopgap until (?) Log4j can use the static HashMap.newHashMap added in Java 19, I presume when the baseline JDK version log4j requires allows us to use that, we can just switch to it and remove any such shim utility class. Or another way to put it, I presume you would readily accept this change if it were simply delegating to the JDK helpers (assuming we could use it)?

kilink avatar Aug 06 '25 16:08 kilink