micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Avoid potential allocation from iterator empty check in Tags/KeyValues

Open izeye opened this issue 9 months ago • 4 comments

This PR changes to check Tags instanceof check before iterator empty check in Tags.of() to save ArrayIterator creation.

izeye avatar Apr 20 '25 01:04 izeye

I still haven't had a chance to do the deeper dive I wanted on this and similar changes. I don't expect it would cause any problem, but I'll hold off on merging it until next release.

shakuzen avatar May 12 '25 05:05 shakuzen

This is going to clash with changes in https://github.com/micrometer-metrics/micrometer/pull/6185, but since this was opened first and has a narrower scope, I'll work on getting this merged first before dealing with that. I've been doing a lot of profiling and debugging. The iterator call can be inlined by C2 in my tests, but that doesn't mean we shouldn't make it better pre-C2 compilation. An interesting scenario that comes up usually during application startup that may be worse with this change is when the argument is Collections.emptyList() etc. A call to iterator() will not allocate in this case, and by making this change, there's an instanceof check that needs to be done (and fails) when it didn't need to be done before. Now, perhaps it would be better practice for all those places to use Tags.empty() instead of Collections.emptyList(), and I'm not sure why they don't - we even have such code in this repo. It shouldn't make a big difference either way, but we should measure and come up with a recommendation if it makes sense.

I've added unit tests that assert no allocations in the cases we don't expect it. This will protect us from future changes and make the expectation clear. I've also made the same changes for KeyValues, and this same change for and.

shakuzen avatar May 30 '25 09:05 shakuzen

I'm out of time for now, but I'll run some JMH tests later and hopefully get this merged soon. Feedback welcome in the meantime.

shakuzen avatar May 30 '25 09:05 shakuzen

I added some benchmarks since the previous ones were only testing one of the Tags.of and Tags.and variants (the String varargs one, which we didn't change).

The results were mostly what we expected: the changes don't make a difference in allocation when the C2 compiler eliminates the iterator allocation. If we run the benchmarks with -XX:TieredStopAtLevel=1 to prevent C2 compilation, we can see the difference in allocation on the ofTags benchmark, but as suspected, there is a slight increased time cost for ofEmptyCollection (due to the instanceof check it didn't need to do before these changes). I wonder how much it's worth our time to try to optimize C1 compiled code that we expect should be C2 compiled. I suppose the question is how reliably we can expect C2 to eliminate the iterator allocation in production usage, which will have a lot more happening from different call sites than these rather micro benchmarks.

I expected the results for TagsAndBenchmark to basically be the same as TagsOfBenchmark, but then I noticed something odd happening. Some runs of TagsAndBenchmark showed no allocation difference with the changes in this PR compared to before - that's expected. However, the normalized allocation was not stable between runs of TagsAndBenchmark with the changes in this PR - specifically andTags and andVarargsString sometimes shows more allocation than before. This is surprising. Looking at the compilation logs, I can see the iterator call is not being inlined and so the allocation is not being eliminated. That's bad - the thing we were trying to avoid is ending up happening sometimes even with C2 compilation. I could not reproduce that when running the benchmarks against 1.13.14 (which doesn't have the changes in this PR), so it seems related to the changes made in the PR but it does not happen consistently.

The logs give the reason for inlining failure as 'unloaded signature classes' and I see Tags$1 marked as unloaded. I couldn't find much general information on this failure reason. There is this wiki. I don't understand why the changes made in this PR would result in this failure some of the time.

I'm not sure it's worth investing more time into these changes unless we can better understand what's happening and why. I'll leave the PR open for now, but these results make me in favor of not merging the changes. If anyone else has a hypothesis to test or can educate me on this, feel free to help.

Edit: for added context, I ran the benchmarks with OpenJDK (Liberica) 21.0.4 and 24.0.1. I saw some C2 compiler bugs were fixed that seemed potentially related, but those fixes should have been included in 24.0.1, yet I still was able to intermittently reproduce the failure to inline.

shakuzen avatar Jun 03 '25 09:06 shakuzen