incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[#1735] fix(client): MultiException class not found when reassign or stage retry is enabled

Open dingshun3016 opened this issue 1 year ago • 10 comments

What changes were proposed in this pull request?

include jetty-util and shade it

Why are the changes needed?

Fix: #1735

When reassign or stag retry is enabled, GrpcServer needs to be started. If there is a port conflict during startup and retry is performed, MultiException will be thrown and the following exception will occur, causing the job to fail.The root cause is that uniffle did not include jetty-util related packages.

ERROR [main] SparkSubmit$$anon$2: spark submit throw error: org/eclipse/jetty/util/MultiException java.lang.NoClassDefFoundError: org/eclipse/jetty/util/MultiException at org.apache.uniffle.common.util.RssUtils.isServerPortBindCollision(RssUtils.java:219) at org.apache.uniffle.common.util.RssUtils.startServiceOnPort(RssUtils.java:197) at org.apache.uniffle.common.rpc.GrpcServer.start(GrpcServer.java:188) at org.apache.spark.shuffle.RssShuffleManager.<init>(RssShuffleManager.java:262) at org.apache.spark.shuffle.DelegationRssShuffleManager.createShuffleManagerInDriver(DelegationRssShuffleManager.java:87) at org.apache.spark.shuffle.DelegationRssShuffleManager.<init>(DelegationRssShuffleManager.java:60) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No need.

dingshun3016 avatar May 20 '24 12:05 dingshun3016

Test Results

 2 641 files  +9   2 641 suites  +9   5h 27m 29s :stopwatch: + 4m 59s    944 tests ±0     943 :white_check_mark: + 1   1 :zzz: ±0  0 :x: ±0  11 773 runs  +9  11 758 :white_check_mark: +10  15 :zzz: ±0  0 :x: ±0 

Results for commit bde6a6c0. ± Comparison against base commit 49698226.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar May 20 '24 13:05 github-actions[bot]

@dingshun3016 Could you create an issue for this?

jerqi avatar May 22 '24 07:05 jerqi

Why we need jetty package in client side? @dingshun3016

zuston avatar May 22 '24 09:05 zuston

@dingshun3016 Could you create an issue for this? Okay, I've created

dingshun3016 avatar May 22 '24 09:05 dingshun3016

Why we need jetty package in client side? @dingshun3016

You can take a look at the stack information above, mainly using the MultiException class in jetty to determine whether it is a port conflict exception.

dingshun3016 avatar May 22 '24 09:05 dingshun3016

I moved MultiException from the jetty package to uniffle and renamed it to RssMultiException

dingshun3016 avatar Jun 21 '24 03:06 dingshun3016

I moved MultiException from the jetty package to uniffle and renamed it to RssMultiException

After consideration, it seems that MultiException cannot be ported to uniffle, because the exception thrown is org/eclipse/jetty/util/MultiException

dingshun3016 avatar Jun 21 '24 04:06 dingshun3016

image

I think we could remove this. @dingshun3016

zuston avatar Jun 24 '24 09:06 zuston

image

I think we could remove this. @dingshun3016

Without using the isServerPortBindCollision, if an exception occurs in startServiceOnPort, retry directly, is that right?

dingshun3016 avatar Jun 25 '24 03:06 dingshun3016

@dingshun3016 Thanks for your contributions. And I have another fix seems can resolve more conflict and CNFE, PTAL https://github.com/apache/incubator-uniffle/pull/1878

maobaolong avatar Jul 09 '24 12:07 maobaolong