seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

Support seatunnel-translation-spark-3.3

Open zhaomin1423 opened this issue 2 years ago • 19 comments

Purpose of this pull request

Check list

  • [ ] Code changed are covered with tests, or it does not need tests for reason:
  • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
  • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

zhaomin1423 avatar Aug 30 '22 18:08 zhaomin1423

Can you add e2e from spark3 and spark2? reference:https://github.com/apache/incubator-seatunnel/pull/2499

Hisoka-X avatar Aug 31 '22 04:08 Hisoka-X

Can you add e2e from spark3 and spark2? reference:#2499

ok, I will add it.

zhaomin1423 avatar Sep 05 '22 16:09 zhaomin1423

Please add licenses according to Dependency licenses action. you can see https://seatunnel.apache.org/docs/contribution/new-license

Thanks, I will do it.

zhaomin1423 avatar Sep 05 '22 16:09 zhaomin1423

Please help me review it. I only add e2e in spark 3.3 for FakeSourceToConsoleIT, will add more after merge it, because the module change frequently, I fixed conflicts repeatedly. @Hisoka-X The spark scope is changed to provided in example module, because the example don't add the release package, the compile scope need handle licenses, it's unnecessary, spark has a lot of dependencies. @ashulin

zhaomin1423 avatar Sep 06 '22 23:09 zhaomin1423

Hi, Please fix UT

Hisoka-X avatar Sep 07 '22 02:09 Hisoka-X

Hi, Please fix UT

The fail is related with flink. The pr don't update, can you help me rerun it?

zhaomin1423 avatar Sep 07 '22 03:09 zhaomin1423

image The fail is strange, I don't how to fix it. @Hisoka-X

zhaomin1423 avatar Sep 07 '22 04:09 zhaomin1423

How user switch spark2.4 and spark3.3 use shell command?

Hisoka-X avatar Sep 07 '22 08:09 Hisoka-X

How user switch spark2.4 and spark3.3 use shell command?

According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.

zhaomin1423 avatar Sep 07 '22 09:09 zhaomin1423

How user switch spark2.4 and spark3.3 use shell command?

According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.

OK for me

Hisoka-X avatar Sep 07 '22 09:09 Hisoka-X

Hi, please reslove confilct then I will merge the code. Thanks!

Hisoka-X avatar Sep 08 '22 09:09 Hisoka-X

Hi, please reslove confilct then I will merge the code. Thanks!

done, thanks.

zhaomin1423 avatar Sep 08 '22 09:09 zhaomin1423

Sorry, it seems that this approval disappeared due to force push.

zhaomin1423 avatar Sep 08 '22 09:09 zhaomin1423

Sorry, it seems that this approval disappeared due to force push.

Never mind, after ci passed, I will approval again.

Hisoka-X avatar Sep 08 '22 10:09 Hisoka-X

Sorry, it seems that this approval disappeared due to force push.

Never mind, after ci passed, I will approval again.

please help me rerun failed ci, thanks.

zhaomin1423 avatar Sep 08 '22 11:09 zhaomin1423

Could you help me review it, merging is blocking, the conflicts are frequent. @ashulin @CalvinKirs

zhaomin1423 avatar Sep 09 '22 03:09 zhaomin1423

Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.

Hisoka-X avatar Sep 26 '22 11:09 Hisoka-X

Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.

I only add one test for 3.3, I will fix conflicts later.

zhaomin1423 avatar Sep 28 '22 00:09 zhaomin1423

spark version :3.3.1 hive version: 3.0.0 mysql versin: 5.7 I had test mysql to hive it throws some exception:

         client token: N/A
         diagnostics: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
        at java.util.ServiceLoader.fail(ServiceLoader.java:232)
        at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
        at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
        at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
        at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 28 more

         ApplicationMaster host: jtbihdp04.sogal.com
         ApplicationMaster RPC port: 39125
         queue: default
         start time: 1667267960414
         final status: FAILED
         tracking URL: http://jtbihdp13.sogal.com:8088/proxy/application_1667009280185_5845/
         user: suofy
22/11/01 10:00:51 ERROR Client: Application diagnostics message: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
        at java.util.ServiceLoader.fail(ServiceLoader.java:232)
        at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
        at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
        at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
        at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 28 more

Exception in thread "main" org.apache.spark.SparkException: Application application_1667009280185_5845 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1342)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

dik111 avatar Nov 01 '22 02:11 dik111

截屏2022-12-20 14 52 40 Who can help me look at this problem

jinmu0410 avatar Dec 20 '22 06:12 jinmu0410

截屏2022-12-20 14 52 40

Who can help me look at this problem

for test spark3 by use spark3_sink build

jinmu0410 avatar Dec 20 '22 06:12 jinmu0410

截屏2022-12-20 14 52 40 Who can help me look at this problem

for test spark3 by use spark3_sink build

@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. https://github.com/apache/incubator-seatunnel/issues/875#issuecomment-1312735235

Hisoka-X avatar Dec 20 '22 07:12 Hisoka-X

截屏2022-12-20 14 52 40 Who can help me look at this problem

for test spark3 by use spark3_sink build

@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. #875 (comment)

@Hisoka-X thank you

jinmu0410 avatar Dec 20 '22 07:12 jinmu0410

@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?

jinmu0410 avatar Dec 20 '22 09:12 jinmu0410

@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?

@JinJiDeJinMu Wonderful! You can continue do your work with this PR. Fix conflict and do some test, if have some problem then fix it.

Hisoka-X avatar Dec 20 '22 09:12 Hisoka-X