hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

Open xiarixiaoyao opened this issue 3 years ago • 1 comments

Change Logs

  1. presto/hive respect payload during merge parquet file and logfile when reading mor table.
  2. presto/hive support read timestamp type for mor table.

Impact

Risk level: high fixed the todo in line115 in RealtimeCompactedRecordReader // TODO(NA): Invoke preCombine here by converting arrayWritable to Avro. This is required since the // deltaRecord may not be a full record and needs values of columns from the parquet

reproduce step

spark.sql(
  """create table tx_null
    |(id int, comb int, col0 int, col1 bigint, col2 float, col3 double, col4 decimal(10,4),
    | col5 string, col6 date, col7 timestamp, col8 boolean, col9 binary, par date)
    | using hudi
    | partitioned by (par)
    | options(
    | type='mor', primaryKey='id', preCombineField='comb',
    | 'hoodie.index.type' = 'BLOOM', 'hoodie.compaction.payload.class'='org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload')""".stripMargin)


spark.sql(
  s"""
     | insert into tx_null values
     | (1,1,99,1111111,101.01,1001.0001,100001.0001,'x000001','2021-12-25','2021-12-25 12:01:01',true,'a01','2021-12-25'),
     | (2,2,99,1111111,102.02,1002.0002,100002.0002,'x000002','2021-12-25','2021-12-25 12:02:02',true,'a02','2021-12-25'),
     | (3,3,99,1111111,103.03,1003.0003,100003.0003,'x000003','2021-12-25','2021-12-25 12:03:03',false,'a03','2021-12-25'),
     | (4,4,99,1111111,104.04,1004.0004,100004.0004,'x000004','2021-12-26','2021-12-26 12:04:04',true,'a04','2021-12-26'),
     | (5,5,99,1111111,105.05,1005.0005,100005.0005,'x000005','2021-12-26','2021-12-26 12:05:05',false,'a05','2021-12-26')
     |""".stripMargin)


spark.sql(
  s"""
     | insert into tx_null values
     | (1,0,null,100002,101.01,1001.0001,100001.0001,'x000001','2021-12-25','2021-12-25 12:01:01',true,'a01','2021-12-25'),
     | (2,1,null,100003,102.02,1002.0002,100002.0002,'x000002','2021-12-25','2021-12-25 12:02:02',true,'a02','2021-12-25'),
     | (3,2,null,100004,103.03,1003.0003,100003.0003,'x000003','2021-12-25','2021-12-25 12:03:03',false,'a03','2021-12-25'),
     | (4,3,null,100005,104.04,1004.0004,100004.0004,'x000004','2021-12-26','2021-12-26 12:04:04',true,'a04','2021-12-26'),
     | (5,4,null,100006,105.05,1005.0005,100005.0005,'x000005','2021-12-26','2021-12-26 12:05:05',false,'a05','2021-12-26')
     |""".stripMargin)

select col0, col1 from tx_null when use spark-sql/flink

99 100002 99 100003 99 100001 99 100005 99 100004

when use presto/hive, query result is +-------+-------+ | 99 | NULL | | 99 | NULL | | 99 | NULL | | 99 | NULL | | 99 | NULL | +-------+-------+

also other payload is not supported.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

xiarixiaoyao avatar Sep 22 '22 08:09 xiarixiaoyao

cancelling all azure CI runs for now to investigate CI flakiness. will retrigger build once we are in stable state. sorry about the inconvenience.

nsivabalan avatar Sep 23 '22 16:09 nsivabalan

@codope @danny0405 @xushiyan @XuQianJin-Stars could you pls help me review this pr, thanks. the UT failed has nothing to do with this pr

xiarixiaoyao avatar Sep 27 '22 07:09 xiarixiaoyao

@xiarixiaoyao There are some CI failures. Can you please fix them and rebase?

codope avatar Sep 27 '22 15:09 codope

will fixed the CI, thanks

xiarixiaoyao avatar Sep 28 '22 07:09 xiarixiaoyao

@hudi-bot run azure

xiarixiaoyao avatar Sep 29 '22 01:09 xiarixiaoyao

Canceling the CI run to prioritize release blocker PRs. Apologies. I will re-trigger once the blockers have finished.

codope avatar Sep 29 '22 09:09 codope

@hudi-bot run azure

xiarixiaoyao avatar Sep 30 '22 01:09 xiarixiaoyao

@hudi-bot run azure

xiarixiaoyao avatar Oct 09 '22 01:10 xiarixiaoyao

@codope could you pls review again, fix all comments, thanks

xiarixiaoyao avatar Oct 09 '22 07:10 xiarixiaoyao

@hudi-bot run azure

xiarixiaoyao avatar Oct 19 '22 06:10 xiarixiaoyao

@hudi-bot run azure

xiarixiaoyao avatar Nov 03 '22 11:11 xiarixiaoyao

CI report:

  • bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
  • 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
  • b4d5c37b1c3121646fa1dcf2a3363228dc045933 Azure: FAILURE Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Nov 03 '22 13:11 hudi-bot