hudi
hudi copied to clipboard
[HUDI-4237] should not sync partition parameters when create non-partition table in spark
issue description
Create a non-partition hudi table in Spark,it will store spark.sql.sources.schema.partCol.0 with an empty value in hiveMetastore. This is unexpected behavior, it should not store spark.sql.sources.schema.partCol.0 in HiveMetastore when it is a non-partition table.
Steps to reproduce the behavior:
- Create a non-partition hudi table in Spark
create table hudi_mor_tbl (
id int,
name string,
price double,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
)
- insert data one row to it.
insert into hudi_mor_tbl select 1, 'a1', 20, 1000;
- cat hoodie.properties in table's base path, it include partition.fields key with an empty value
hoodie.table.partition.fields=
- check spark.sql.sources.schema.partCol.0 that stored in table TABLE_PARAMS of the HiveMetaStore .
|50|spark.sql.sources.schema.partCol.0|
it has a value "".
Change Logs
When init a non-partition hoodie table, should set PartitionFields as null instead of empty string "". Then after sync table meta to hiveMetaStore, it will not store spark.sql.sources.schema.partCol.
Impact
fix the bug when create non-partition table in spark more detail see jira https://issues.apache.org/jira/browse/HUDI-4237 Risk level: none | low | medium | high
low
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@XuQianJin-Stars @alexeykudinkin could you check if this is needed? Functionality-wise, is the fix necessary?
@dujl can you please update the PR description w/ the crux of the issue?
The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.
@dujl can you please update the PR description w/ the crux of the issue?
The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.
done
@alexeykudinkin please help to review and approve
Approved already.
@nsivabalan can you please help landing this one?
Hi @XuQianJin-Stars , Can you land this bugfix?
hi @dujl the ci is failed.
ok,i will fix it
From | |
Date | 09/17/2022 10:56 |
To | |
Cc | 、 |
Subject | Re: [apache/hudi] [HUDI-4237] should not sync partition parameters when create non-partition table in spark (PR #6525) |
hi the ci is failed.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
@dujl It's likely due to CI flakiness. Could you rebase this PR on the latest master?
@hudi-bot run azure
CI report:
- 75068620642f5b97754f43bc312944e376f2f399 Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build
@dujl The failed tests in GH action are reproducible. Could you look into those?
@dujl The failed tests in GH action are reproducible. Could you look into those?
ok, i will check it
close in favor or #6821