hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4237] should not sync partition parameters when create non-partition table in spark

Open dujl opened this issue 1 year ago • 13 comments

issue description

Create a non-partition hudi table in Spark,it will store spark.sql.sources.schema.partCol.0 with an empty value in hiveMetastore. This is unexpected behavior, it should not store spark.sql.sources.schema.partCol.0 in HiveMetastore when it is a non-partition table.

Steps to reproduce the behavior:

  1. Create a non-partition hudi table in Spark
create table hudi_mor_tbl (
id int,
name string,
price double,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
) 
  1. insert data one row to it.
insert into hudi_mor_tbl select 1, 'a1', 20, 1000; 
  1. cat hoodie.properties in table's base path, it include partition.fields key with an empty value
hoodie.table.partition.fields=
  1. check spark.sql.sources.schema.partCol.0 that stored in table TABLE_PARAMS of the HiveMetaStore .
|50|spark.sql.sources.schema.partCol.0|

it has a value "".

Change Logs

When init a non-partition hoodie table, should set PartitionFields as null instead of empty string "". Then after sync table meta to hiveMetaStore, it will not store spark.sql.sources.schema.partCol.

Impact

fix the bug when create non-partition table in spark more detail see jira https://issues.apache.org/jira/browse/HUDI-4237 Risk level: none | low | medium | high

low

Contributor's checklist

  • [x] Read through contributor's guide
  • [x] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

dujl avatar Aug 29 '22 08:08 dujl

@XuQianJin-Stars @alexeykudinkin could you check if this is needed? Functionality-wise, is the fix necessary?

yihua avatar Sep 05 '22 23:09 yihua

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

alexeykudinkin avatar Sep 06 '22 19:09 alexeykudinkin

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

done

dujl avatar Sep 07 '22 06:09 dujl

@alexeykudinkin please help to review and approve

dujl avatar Sep 07 '22 06:09 dujl

Approved already.

@nsivabalan can you please help landing this one?

alexeykudinkin avatar Sep 08 '22 05:09 alexeykudinkin

Hi @XuQianJin-Stars , Can you land this bugfix?

minihippo avatar Sep 16 '22 09:09 minihippo

hi @dujl the ci is failed.

XuQianJin-Stars avatar Sep 17 '22 02:09 XuQianJin-Stars

ok,i will fix it


---- Replied Message ----
From
Date 09/17/2022 10:56
To
Cc
Subject Re: [apache/hudi] [HUDI-4237] should not sync partition parameters when create non-partition table in spark (PR #6525)

hi the ci is failed.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: <apache/hudi/pull/6525/c1249984566@github.com>

dujl avatar Sep 17 '22 03:09 dujl

@dujl It's likely due to CI flakiness. Could you rebase this PR on the latest master?

yihua avatar Sep 17 '22 03:09 yihua

@hudi-bot run azure

yihua avatar Sep 17 '22 06:09 yihua

CI report:

  • 75068620642f5b97754f43bc312944e376f2f399 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Sep 17 '22 09:09 hudi-bot

@dujl The failed tests in GH action are reproducible. Could you look into those?

yihua avatar Sep 17 '22 17:09 yihua

@dujl The failed tests in GH action are reproducible. Could you look into those?

ok, i will check it

dujl avatar Sep 19 '22 02:09 dujl

close in favor or #6821

xushiyan avatar Sep 29 '22 05:09 xushiyan