hudi
hudi copied to clipboard
[HUDI-4880] Fix corrupted parquet file issue left over by cancelled compaction task
Change Logs
- Remove marker delete code in
CompactionPlanOperator
, which could cause corrupted parquet files issue if compaction tasks were cancelled - Fix HUDI-4108 in another way, ignore the marker file if it is already exist when creating
More background detail in https://issues.apache.org/jira/browse/HUDI-4880
Impact
No API changed, minor change for fixing bug.
Risk level: none
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
@hudi-bot run azure
CI pipeline failed because of Connection refused
issue, let me re-run it again.
@hudi-bot run azure
@TengHuo please rebase master; there were some flaky test fixes
@xushiyan
sure, np, just rebased it to the latest master
Just reverted the code about ignoring duplicate marker error. The code will throw error if there is an existing duplicate marker file now.
Something wrong in maven build, not related with this PR.
Error: Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.13.0-SNAPSHOT: Failed to collect dependencies at io.confluent:kafka-avro-serializer:jar:5.3.4: Failed to read artifact descriptor for io.confluent:kafka-avro-serializer:jar:5.3.4: Could not transfer artifact io.confluent:kafka-avro-serializer:pom:5.3.4 from/to confluent (https://packages.confluent.io/maven/): transfer failed for https://packages.confluent.io/maven/io/confluent/kafka-avro-serializer/5.3.4/kafka-avro-serializer-5.3.4.pom: Connection reset -> [Help 1]
CI report:
- 861db5109feea40129392a38d17c10f84397d258 UNKNOWN
- d3d5a30845177e6a0fe981e2fee5b6600556da76 Azure: FAILURE
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build