seatunnel
seatunnel copied to clipboard
Support seatunnel-translation-spark-3.3
Purpose of this pull request
Check list
- [ ] Code changed are covered with tests, or it does not need tests for reason:
- [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
- [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
Can you add e2e from spark3 and spark2? reference:https://github.com/apache/incubator-seatunnel/pull/2499
Can you add e2e from spark3 and spark2? reference:#2499
ok, I will add it.
Please add licenses according to
Dependency licenses action
. you can see https://seatunnel.apache.org/docs/contribution/new-license
Thanks, I will do it.
Please help me review it. I only add e2e in spark 3.3 for FakeSourceToConsoleIT, will add more after merge it, because the module change frequently, I fixed conflicts repeatedly. @Hisoka-X The spark scope is changed to provided in example module, because the example don't add the release package, the compile scope need handle licenses, it's unnecessary, spark has a lot of dependencies. @ashulin
Hi, Please fix UT
Hi, Please fix UT
The fail is related with flink. The pr don't update, can you help me rerun it?

How user switch spark2.4 and spark3.3 use shell command?
How user switch spark2.4 and spark3.3 use shell command?
According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.
How user switch spark2.4 and spark3.3 use shell command?
According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.
OK for me
Hi, please reslove confilct then I will merge the code. Thanks!
Hi, please reslove confilct then I will merge the code. Thanks!
done, thanks.
Sorry, it seems that this approval disappeared due to force push.
Sorry, it seems that this approval disappeared due to force push.
Never mind, after ci passed, I will approval again.
Sorry, it seems that this approval disappeared due to force push.
Never mind, after ci passed, I will approval again.
please help me rerun failed ci, thanks.
Could you help me review it, merging is blocking, the conflicts are frequent. @ashulin @CalvinKirs
Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.
Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.
I only add one test for 3.3, I will fix conflicts later.
spark version :3.3.1 hive version: 3.0.0 mysql versin: 5.7 I had test mysql to hive it throws some exception:
client token: N/A
diagnostics: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 28 more
ApplicationMaster host: jtbihdp04.sogal.com
ApplicationMaster RPC port: 39125
queue: default
start time: 1667267960414
final status: FAILED
tracking URL: http://jtbihdp13.sogal.com:8088/proxy/application_1667009280185_5845/
user: suofy
22/11/01 10:00:51 ERROR Client: Application diagnostics message: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 28 more
Exception in thread "main" org.apache.spark.SparkException: Application application_1667009280185_5845 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1342)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

![]()
Who can help me look at this problem
for test spark3 by use spark3_sink build
Who can help me look at this problem
for test spark3 by use spark3_sink build
@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. https://github.com/apache/incubator-seatunnel/issues/875#issuecomment-1312735235
Who can help me look at this problem
for test spark3 by use spark3_sink build
@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. #875 (comment)
@Hisoka-X thank you
@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?
@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?
@JinJiDeJinMu Wonderful! You can continue do your work with this PR. Fix conflict and do some test, if have some problem then fix it.