hudi [HUDI-4237] should not sync partition parameters when create non-partition table in spark

[HUDI-4237] should not sync partition parameters when create non-partition table in spark

Open dujl opened this issue 1 year ago • 13 comments

issue description

Create a non-partition hudi table in Spark，it will store spark.sql.sources.schema.partCol.0 with an empty value in hiveMetastore. This is unexpected behavior, it should not store spark.sql.sources.schema.partCol.0 in HiveMetastore when it is a non-partition table.

Steps to reproduce the behavior:

Create a non-partition hudi table in Spark

create table hudi_mor_tbl (
id int,
name string,
price double,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
）

insert data one row to it.

insert into hudi_mor_tbl select 1, 'a1', 20, 1000;

cat hoodie.properties in table's base path, it include partition.fields key with an empty value

hoodie.table.partition.fields=

check spark.sql.sources.schema.partCol.0 that stored in table TABLE_PARAMS of the HiveMetaStore .

|50|spark.sql.sources.schema.partCol.0|

it has a value "".

Change Logs

When init a non-partition hoodie table, should set PartitionFields as null instead of empty string "". Then after sync table meta to hiveMetaStore, it will not store spark.sql.sources.schema.partCol.

Impact

fix the bug when create non-partition table in spark more detail see jira https://issues.apache.org/jira/browse/HUDI-4237 Risk level: none | low | medium | high

low

Contributor's checklist

[x] Read through contributor's guide
[x] Change Logs and Impact were stated clearly
[ ] Adequate tests were added if applicable
[ ] CI passed

Aug 29 '22 08:08 dujl

@XuQianJin-Stars @alexeykudinkin could you check if this is needed? Functionality-wise, is the fix necessary?

Sep 05 '22 23:09 yihua

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

Sep 06 '22 19:09 alexeykudinkin

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

done

Sep 07 '22 06:09 dujl

@alexeykudinkin please help to review and approve

Sep 07 '22 06:09 dujl

Approved already.

@nsivabalan can you please help landing this one?

Sep 08 '22 05:09 alexeykudinkin

Hi @XuQianJin-Stars , Can you land this bugfix?

Sep 16 '22 09:09 minihippo

hi @dujl the ci is failed.

Sep 17 '22 02:09 XuQianJin-Stars

ok，i will fix it

---- Replied Message ----

From
Date	09/17/2022 10:56
To
Cc	、
Subject	Re: [apache/hudi] [HUDI-4237] should not sync partition parameters when create non-partition table in spark (PR #6525)

hi the ci is failed.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.

Sep 17 '22 03:09 dujl

@dujl It's likely due to CI flakiness. Could you rebase this PR on the latest master?

Sep 17 '22 03:09 yihua

@hudi-bot run azure

Sep 17 '22 06:09 yihua

CI report:

75068620642f5b97754f43bc312944e376f2f399 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

Sep 17 '22 09:09 hudi-bot

@dujl The failed tests in GH action are reproducible. Could you look into those?

Sep 17 '22 17:09 yihua

@dujl The failed tests in GH action are reproducible. Could you look into those?

ok， i will check it

Sep 19 '22 02:09 dujl

close in favor or #6821

Sep 29 '22 05:09 xushiyan

hudi hudi copied to clipboard

[HUDI-4237] should not sync partition parameters when create non-partition table in spark

issue description

Change Logs

Impact

Contributor's checklist

CI report:

hudi
hudi copied to clipboard