spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48663][CORE] Use a double-check locking mechanism to minimize synchronization overhead in NettyUtils

Open rickyma opened this issue 1 year ago • 6 comments

What changes were proposed in this pull request?

Use a double-check locking mechanism to minimize synchronization overhead in NettyUtils when calling getSharedPooledByteBufAllocator.

Why are the changes needed?

The current way to call getSharedPooledByteBufAllocator in NettyUtils is inefficient because synchronization can introduce overhead.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

rickyma avatar Jun 19 '24 09:06 rickyma

Have you observed thread contention issues with existing code ? I have not seen evidence of this - would like to understand more.

mridulm avatar Jun 19 '24 11:06 mridulm

No so far. But it is still an improvement. We could consider having this.

rickyma avatar Jun 19 '24 12:06 rickyma

If there is some improvement due to the proposed change, which is relevant to how spark uses it, I would be very interested.

Performance related changes should give details on how it helps - reproducible benchmarks, etc

mridulm avatar Jun 19 '24 17:06 mridulm

Does NettyUtils.getSharedPooledByteBufAllocator even get called very often?

From a code search it looks like it is only called from TransportServer and TransportClientFactory constructors. Servers are created infrequently (AFAIK usually statically / once during app boot). Similarly, it looks like client factories are also created infrequently.

Given that, I am similarly skeptical that this change would actually meaningfully impact real-world performance.

JoshRosen avatar Jun 19 '24 18:06 JoshRosen

Does NettyUtils.getSharedPooledByteBufAllocator even get called very often?

From a code search it looks like it is only called from TransportServer and TransportClientFactory constructors. Servers are created infrequently (AFAIK usually statically / once during app boot). Similarly, it looks like client factories are also created infrequently.

Given that, I am similarly skeptical that this change would actually meaningfully impact real-world performance.

You are right, but NettyUtils is a low-level basic utility class, and it is hard for us to guarantee that there won't be any new high-frequency usage codes in Spark in the future. Therefore, I think NettyUtils should have the ability to reduce synchronization overhead on its own, as this is something a low-level utility class should ensure. WDYT?

rickyma avatar Jun 20 '24 02:06 rickyma

If there is some improvement due to the proposed change, which is relevant to how spark uses it, I would be very interested.

Performance related changes should give details on how it helps - reproducible benchmarks, etc

+1, Agree with @mridulm

LuciferYang avatar Jun 20 '24 04:06 LuciferYang

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Sep 29 '24 00:09 github-actions[bot]