seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

When the hive table storage type is orc, data sinks to the hive, and the task fails to be executed

Open gaotong521 opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

When the hive table storage type is orc, data sinks to the hive and the FieldMapper transform is configured. If certain fields in the hive table are not mapped, tasks fail to be executed

SeaTunnel Version

2.3.4

SeaTunnel Config

{
    "env": {
        "parallelism": 3,
        "job.mode": "BATCH",
        "checkpoint.interval": 30000,
        "job.name": "seatunnel_1712823979630"
    },
    "source": [
        {
            "plugin_name": "Jdbc",
            "result_table_name": "table_source",
            "user": "postgres",
            "password": "C3kk4v5_b4f2Jr",
            "driver": "org.postgresql.Driver",
            "url": "jdbc:postgresql://10.188.15.91:5434/gis",
            "query": "select event_id,event_type,event_radius,event_source,start_time,end_time,priority,latitude,longitude,elevation,node_ids,create_time,update_time from ghcloud.gh_traffic_event_info"
        }
    ],
    "transform": [
        {
            "plugin_name": "FieldMapper",
            "source_table_name": "table_source",
            "result_table_name": "table_source_FieldMapper",
            "field_mapper": {
                "event_id": "event_id",
                "event_type": "event_type",
                "event_radius": "event_radius",
                "event_source": "event_source",
                "start_time": "start_time",
                "end_time": "end_time",
                "priority": "priority",
                "latitude": "latitude",
                "longitude": "longitude",
                "elevation": "elevation",
                "node_ids": "node_ids",
                "create_time": "create_time",
                "update_time": "update_time"
            }
        }
    ],
    "sink": [
        {
            "plugin_name": "Hive",
            "source_table_name": "table_source_FieldMapper",
            "table_name": "gh_cloud_data_model.dwd_pub_traffic_event",
            "metastore_uri": "thrift://cloudera-hadoop-61:9083"
        }
    ]
}

Running Command

Executed by dolphin scheduler

Error Exception

SHUTDOWN
	2024-04-12 11:31:30,246 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client......
	2024-04-12 11:31:30,246 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ......
	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
	
	===============================================================================
	
	
	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Fatal Error, 
	
	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Please submit bug report in https://github.com/apache/seatunnel/issues
	
	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Reason:SeaTunnel job executed failed 
	
	2024-04-12 11:31:30,248 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
		at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
		at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
	Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: java.lang.NullPointerException
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:257)
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
		at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
		at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:648)
		at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:949)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		at java.lang.Thread.run(Thread.java:748)
	Caused by: java.lang.NullPointerException
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.buildSchemaWithRowType(OrcWriteStrategy.java:196)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.getOrCreateWriter(OrcWriteStrategy.java:116)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:75)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:134)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:46)
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:247)
		... 16 more
	
		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
		... 2 more
	 
	2024-04-12 11:31:30,248 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
	===============================================================================
	
	
	
	Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
		at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
		at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
	Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: java.lang.NullPointerException
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:257)
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
		at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
		at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
		at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:648)
		at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:949)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		at java.lang.Thread.run(Thread.java:748)
	Caused by: java.lang.NullPointerException
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.buildSchemaWithRowType(OrcWriteStrategy.java:196)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.getOrCreateWriter(OrcWriteStrategy.java:116)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:75)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:134)
		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:46)
		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:247)
		... 16 more
	
		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
		... 2 more
	2024-04-12 11:31:30,249 INFO  [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-2] - run shutdown hook because get close signal
[INFO] 2024-04-12 11:31:30.453 +0800 - FINALIZE_SESSION

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

gaotong521 avatar Apr 12 '24 03:04 gaotong521

Please paste in the ddl statement of the [gh_cloud_data_model.dwd_pub_traffic_event table]. It is suspected that the name of the mapped field is inconsistent with that of the destination table, which causes the null pointer problem

LeonYoah avatar Apr 17 '24 05:04 LeonYoah

You should pay attention to two things: one is that all fields in the [hive] table should have corresponding fields from upstream. If there are no extra fields upstream, you can pass the empty string, that is, [''], as an empty field, but you cannot specify [null] as an empty field, and the field mapping name should be the same as the field name in the table.

LeonYoah avatar Apr 17 '24 09:04 LeonYoah

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar May 18 '24 00:05 github-actions[bot]