hudi
hudi copied to clipboard
[HUDI-4433] Hudi-CLI repair deduplicate not working with non-partitio…
…ned dataset
Change Logs
When using the repair deduplicate command with hudi-cli, There is no way to run it on the unpartitioned dataset, so modify the cli parameter.
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level: none | low | medium | high
Choose one. If medium or high, explain what verification was done to mitigate the risks.
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@pratyakshsharma Could you review this short code?
@yihua I added integration tests by referencing existing ones. Could you review this?
@brightwon Can you please rebase?
@yihua @codope I have found the cause of the above ci failed.
If an empty string is passed to the duplicatedPartitionPath parameter of the repair deduplicate command, the checkNotNull function recognizes a null value when processing the argument of the sparkLauncher.addAppArgs function, and the spark job does not run properly.
If fixed this part, I think it will be successful.
@yihua @codope I have found the cause of the above ci failed.
If an empty string is passed to the
duplicatedPartitionPathparameter of the repair deduplicate command, the checkNotNull function recognizes a null value when processing the argument of the sparkLauncher.addAppArgs function, and the spark job does not run properly.If fixed this part, I think it will be successful.
@brightwon Is this behavior related to what #6489 is trying to fix?
It could be a spring shell issue! In order to clearly understand this problem, I need to log and check all the parameters passed from the cli.
If I can test in my local machine, it will be easy to check. But I'm having trouble constructing a local docker environments. Is there any other way I can run the integration-tests, check the logs I left, and debug?
@brightwon You may use mvn command to run the specific IT.
@codope rebase is done. and I separated table path for non partitioned dataset tests.
CI report:
- 0c5c799bac6f3416da7ca3be724005c4b583c9e7 UNKNOWN
- feb7bcf7b71ecc65d796a460c433a431357abd9d Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build