hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Sparksql cow non-partition table execute 'merge into ' sqlstatment occure error after setting the tblproperties param "hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload'"

Open fujianhua168 opened this issue 3 years ago • 1 comments

Describe the problem you faced i create a cow non-partition table , setting the tblproperties params with "hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload' and 'hoodie.datasource.write.hive_style_partitioning = false', after execute the "insert" and "update set" sql statement, result is normal with no error. but use the "merge into" sql statement, occure a error as below "Stacktrace"。

To Reproduce reproduce the behavior:

--step 1: create table 
drop table hudi_cow_pk_cbfield_tbl;
create table hudi_cow_pk_cbfield_tbl (
  id bigint,
  name string,
  ts bigint
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.write.hive_style_partitioning = false,
  hoodie.datasource.write.operation = 'upsert',
  hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload'
 )
;

--step 2: insert into a recored with primaryKey=1, preCombineField=1000
insert into hudi_cow_pk_cbfield_tbl select 1, 'a0', 1000;
--step 3: 'insert'  with same primaryKey, but change the preCombineField value to the smaller value 100,  the action not effect expectedly (note:is normal for setting the param: hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload')
insert into hudi_cow_pk_cbfield_tbl select 1, 'a0_new', 100;
select * from hudi_cow_pk_cbfield_tbl;

--step 4: 'update' action with same primaryKey, but change the preCombineField value to the smaller value 20, the action not effect expectedly (note:is normal for setting the param: hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload')
update hudi_cow_pk_cbfield_tbl set name='a1_new',ts=20  where id= 1;
select * from hudi_cow_pk_cbfield_tbl;

--step 5: 'merge into' action with same primaryKey, occure a error: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
merge into hudi_cow_pk_cbfield_tbl as target
using (select 1 as id,'a1_merge' as name,2000 as ts) as source
on target.id = source.id
when matched then update set *
when not matched then insert *
;

Expected behavior a cow non-partition table , why have a error about partition "HoodieUpsertException: Error upserting bucketType UPDATE for partition :0"? and why 'merge into ' occure error but'insert ' and 'update set' is normal effect ?

Environment Description

  • Hudi version :0.11.1
  • Spark version :3.2.1
  • Hive version :3.1.0
  • Hadoop version :3.1.1
  • Storage (HDFS/S3/GCS..) : HDFS
  • Running on Docker? (yes/no) : no

Stacktrace

39:23 WARN: Timeline-server-based markers are not supported for HDFS: base path hdfs://bigbigworld/user/hive/warehouse/hudi_demo.db/hudi_cow_pk_cbfield_tbl. Falling back to direct markers. 3071039 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 996.0 (TID 27259) (szzb-bg-uat-sdp-hadoop-05 executor 9): org.apache.hudi.exception.HoodieUpsertExceptio n: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merg e new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:161) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { r ecordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:155) ... 32 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLoc ation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:122) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:112) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at java.lang.Long.compareTo(Long.java:54) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.needUpdatingPersistedRecord(DefaultHoodieRecordPayload.java:139) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.combineAndGetUpdateValue(DefaultHoodieRecordPayload.java:62) at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:332) ... 8 more

3071575 [task-result-getter-2] ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 996.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 996.0 failed 4 times, most recent failure: Lost task 0.3 in stage 996.0 (TID 27262) (szzb-bg-uat-sdp-hadoop-05 executor 8): or g.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merg e new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:161) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { r ecordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:155) ... 32 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLoc ation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:122) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:112) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at java.lang.Long.compareTo(Long.java:54) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.needUpdatingPersistedRecord(DefaultHoodieRecordPayload.java:139) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.combineAndGetUpdateValue(DefaultHoodieRecordPayload.java:62) at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:332) ... 8 more

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279) at org.apache.spark.rdd.RDD.count(RDD.scala:1253) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:643) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:315) at org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.executeUpsert(MergeIntoHoodieTableCommand.scala:290) at org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:154) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78) at org.apache.spark.sql.Dataset.(Dataset.scala:219) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merg e new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:161) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { r ecordKey=id:1 partitionPath=}, currentLocation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:155) ... 32 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to combine/merge new record with old value in storage, for new record {HoodieRecord{key=HoodieKey { recordKey=id:1 partitionPath=}, currentLoc ation='HoodieRecordLocation {instantTime=20220810095846644, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}', newLocation='HoodieRecordLocation {instantTime=20220810101719437, fileId=60c04f95-ca5e-4f82-9558-40da29cc022e-0}'}}, old value {{"_hoodie_commit_time": "20220810095824514", "_hoodie_commit_seqno": "20220810095824514_0_0", "_hoodie_record_key": "id:1", "_hoodie_partition_path": "", "_hoodie_file_name": "60c04f95-ca5e-4f82-9558-40da29cc022e-0_0-937-24808_20220810095846644.parquet", "id": 1, "name": "a0", "ts": 1000}} at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:122) at org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:112) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at java.lang.Long.compareTo(Long.java:54) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.needUpdatingPersistedRecord(DefaultHoodieRecordPayload.java:139) at org.apache.hudi.common.model.DefaultHoodieRecordPayload.combineAndGetUpdateValue(DefaultHoodieRecordPayload.java:62) at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:332) ... 8 more

fujianhua168 avatar Aug 10 '22 02:08 fujianhua168

@fujianhua168 try merge into hudi_cow_pk_cbfield_tbl as target using (select 1 as id,'a1_merge' as name, cast(2000 as BIGINT) as ts) as source on target.id = source.id when matched then update set * when not matched then insert * ; ? I guess this is a bug due to Merge into does not cast the source column's type to target

fengjian428 avatar Aug 10 '22 08:08 fengjian428

@fujianhua168 : did you try the fix suggested by @fengjian428 .

nsivabalan avatar Aug 27 '22 20:08 nsivabalan

This will be addressed by https://github.com/apache/hudi/pull/6361

@xushiyan you can assign this one to myself

alexeykudinkin avatar Sep 09 '22 21:09 alexeykudinkin

Created HUDI-4879

alexeykudinkin avatar Sep 19 '22 20:09 alexeykudinkin

There is a PR fixing the issue. Closing this ticket.

yihua avatar Sep 26 '22 18:09 yihua