seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Bug] [hive-connector] ERROR commit.FileSinkAggregatedCommitter: commit aggregatedCommitInfo error java.lang.NullPointerException

Open gitfortian opened this issue 2 years ago • 3 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

spark in local mode write data into hive ,then change to yarn cluster mode ,spark read fake source and write to hive ,ite shows java.lang.NullPointerException

SeaTunnel Version

2.3.0 -beta

SeaTunnel Config

env {
  # You can set flink configuration here
#  job.mode = "STREAMING"
  execution.parallelism = 1
  job.name="test_hive_source_to_hive"
}

source {
  FakeSource {
  row.num = 1000
  schema = {
    fields {
      c_string = string
      c_boolean = boolean
      c_int = int
      c_bigint = bigint
      }
    }
  }
}

transform {
}

sink {
  # choose stdout output plugin to output data to console

  Hive {
    table_name = "test.seatunnel_orc"
    metastore_uri = "thrift://1.1.1.1:9083"
    partition_by = ["c_int"]
    sink_columns = ["c_string", "c_boolean", "c_bigint","c_int"]
  }
}

Running Command

bin/start-seatunnel-spark-connector-v2.sh --master yarn  --deploy-mode client  --config config/fake_hive.conf

Error Exception

INFO hive.metastore: Connected to metastore.
22/11/04 15:48:32 ERROR commit.FileSinkAggregatedCommitter: commit aggregatedCommitInfo error
java.lang.NullPointerException
        at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:234)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:225)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:460)
        at org.apache.seatunnel.connectors.seatunnel.file.sink.util.FileSystemUtils.getFileSystem(FileSystemUtils.java:42)
        at org.apache.seatunnel.connectors.seatunnel.file.sink.util.FileSystemUtils.renameFile(FileSystemUtils.java:81)
        at org.apache.seatunnel.connectors.seatunnel.file.sink.commit.FileSinkAggregatedCommitter.lambda$commit$0(FileSinkAggregatedCommitter.java:42)
        at java.util.Collections$SingletonList.forEach(Collections.java:4822)
        at org.apache.seatunnel.connectors.seatunnel.file.sink.commit.FileSinkAggregatedCommitter.commit(FileSinkAggregatedCommitter.java:37)
        at org.apache.seatunnel.connectors.seatunnel.hive.commit.HiveSinkAggregatedCommitter.commit(HiveSinkAggregatedCommitter.java:49)
        at org.apache.seatunnel.translation.spark.sink.SparkDataSourceWriter.commit(SparkDataSourceWriter.java:60)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:136)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:160)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:157)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280)
        at org.apache.seatunnel.core.starter.spark.execution.SinkExecuteProcessor.execute(SinkExecuteProcessor.java:84)
        at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.execute(SparkExecution.java:56)
        at org.apache.seatunnel.core.starter.spark.command.SparkApiTaskExecuteCommand.execute(SparkApiTaskExecuteCommand.java:52)
        at org.apache.seatunnel.core.starter.Seatunnel.run(Seatunnel.java:39)
        at org.apache.seatunnel.core.starter.spark.SeatunnelSpark.main(SeatunnelSpark.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/11/04 15:48:32 INFO v2.WriteToDataSourceV2Exec: Data source writer org.apache.seatunnel.translation.spark.sink.SparkDataSourceWriter@fbbd90c committed.
22/11/04 15:48:32 INFO execution.SparkExecution: Spark Execution started
22/11/04 15:48:32 INFO spark.SparkContext: Invoking stop() from shutdown hook

Flink or Spark Version

spark 2.4.8

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

gitfortian avatar Nov 04 '22 07:11 gitfortian

i meet the same error

gitfortian avatar Nov 04 '22 08:11 gitfortian

@TyrantLucifer Hi, PTAL. Thanks!

Hisoka-X avatar Nov 09 '22 07:11 Hisoka-X

Could you please offer more details about you task? Such as create hive table sql and example data of source. BTW, Could you please try again with newest seatunnel version that compiled from dev branch? Because in pr #3258 I fixed some bugs.

TyrantLucifer avatar Nov 10 '22 09:11 TyrantLucifer

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Dec 11 '22 00:12 github-actions[bot]

Fixed by #3258

TyrantLucifer avatar Dec 12 '22 04:12 TyrantLucifer