datahub Use Spark to write the dataframe to hudi and get the error ERROR DatahubSparkListener: java.lang.NullPointerException

Use Spark to write the dataframe to hudi and get the error ERROR DatahubSparkListener: java.lang.NullPointerException

Open CaesarWangX opened this issue 2 years ago • 1 comments

We want to get the lineage of the spark job

Our env is emr Spark version is 3.1.2 Hudi version is 0.8.0 datahub version is 0.8.45

Our spark job is to write the data to the hudi after processing the data in the read area As a result, we only get a pipline on the datahub, which cannot get the lineage

And we found the following errors in the job log 22/10/21 07:05:24 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 22/10/21 07:05:24 ERROR DatahubSparkListener: java.lang.NullPointerException at datahub.spark.DatasetExtractor.lambda$static$6(DatasetExtractor.java:143) at datahub.spark.DatasetExtractor.asDataset(DatasetExtractor.java:228) at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:114) at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:350) at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:262) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

Oct 21 '22 07:10 CaesarWangX

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

Nov 21 '22 02:11 github-actions[bot]

Hi @CaesarWangX this seems like a troubleshooting issue, rather than a bug. We're happy to provide community support on our Slack channel, but currently reserve git issues for bugs.

If you're still having trouble, please join us at [slack.datahubproject.io](https://slack.datahubproject.io) and we can troubleshoot there. For now, I'm going to close this issue.

Dec 07 '22 20:12 laulpogan

datahub datahub copied to clipboard

Use Spark to write the dataframe to hudi and get the error ERROR DatahubSparkListener: java.lang.NullPointerException

datahub
datahub copied to clipboard