hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4433] Hudi-CLI repair deduplicate not working with non-partitio…

Open brightwon opened this issue 3 years ago • 8 comments

…ned dataset

Change Logs

When using the repair deduplicate command with hudi-cli, There is no way to run it on the unpartitioned dataset, so modify the cli parameter.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level: none | low | medium | high

Choose one. If medium or high, explain what verification was done to mitigate the risks.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

brightwon avatar Aug 09 '22 14:08 brightwon

@pratyakshsharma Could you review this short code?

brightwon avatar Aug 28 '22 11:08 brightwon

@yihua I added integration tests by referencing existing ones. Could you review this?

brightwon avatar Sep 10 '22 10:09 brightwon

@brightwon Can you please rebase?

codope avatar Sep 16 '22 15:09 codope

@yihua @codope I have found the cause of the above ci failed.

If an empty string is passed to the duplicatedPartitionPath parameter of the repair deduplicate command, the checkNotNull function recognizes a null value when processing the argument of the sparkLauncher.addAppArgs function, and the spark job does not run properly.

If fixed this part, I think it will be successful.

brightwon avatar Sep 17 '22 03:09 brightwon

@yihua @codope I have found the cause of the above ci failed.

If an empty string is passed to the duplicatedPartitionPath parameter of the repair deduplicate command, the checkNotNull function recognizes a null value when processing the argument of the sparkLauncher.addAppArgs function, and the spark job does not run properly.

If fixed this part, I think it will be successful.

@brightwon Is this behavior related to what #6489 is trying to fix?

yihua avatar Sep 17 '22 03:09 yihua

It could be a spring shell issue! In order to clearly understand this problem, I need to log and check all the parameters passed from the cli.

If I can test in my local machine, it will be easy to check. But I'm having trouble constructing a local docker environments. Is there any other way I can run the integration-tests, check the logs I left, and debug?

brightwon avatar Sep 17 '22 04:09 brightwon

@brightwon You may use mvn command to run the specific IT.

yihua avatar Sep 17 '22 07:09 yihua

@codope rebase is done. and I separated table path for non partitioned dataset tests.

brightwon avatar Sep 24 '22 09:09 brightwon

CI report:

  • 0c5c799bac6f3416da7ca3be724005c4b583c9e7 UNKNOWN
  • feb7bcf7b71ecc65d796a460c433a431357abd9d Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Sep 24 '22 12:09 hudi-bot