hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-2780] Fix the issue of Mor log skipping complete blocks when reading data

Open hj2016 opened this issue 3 years ago • 9 comments

Tips

  • Thank you very much for contributing to Apache Hudi.
  • Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

Fix Mor reads the log file and skips the complete block as a bad block, resulting in data loss

jira issues https://issues.apache.org/jira/browse/HUDI-2780

Committer checklist

  • [ ] Has a corresponding JIRA in PR title & commit

  • [ ] Commit message is descriptive of the change

  • [ ] CI is green

  • [ ] Necessary doc changes done or have another open PR

  • [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hj2016 avatar Nov 17 '21 08:11 hj2016

cc @alexeykudinkin can you take a pass at this please?

vinothchandar avatar Dec 15 '21 02:12 vinothchandar

I will let @alexeykudinkin take care of this patch going forward.

nsivabalan avatar Dec 20 '21 22:12 nsivabalan

@alexeykudinkin : Is there any follow up on this patch. Let's sync up sometime to see if we know what's the right fix and if we can squeeze it in for 0.10.1.

nsivabalan avatar Jan 09 '22 19:01 nsivabalan

@hj2016 : can you please help clarify things. we want to see if there is a real bug, we want to fix it before 0.11.

nsivabalan avatar Feb 25 '22 04:02 nsivabalan

@nsivabalan image

The hudi log file consists of blocks. A log may contain several blocks information written by deltacommits. Each block is divided by MAGIC. The content of each block is shown in the figure above. The Block Header will record the deltacommit instantTime of each block. It will scan and read the blocks that need to be merged during the compact process. HoodieLogFileReader is used to read log files and convert the log files into blocks. First, it will try to read the MAGIC separation. If find the separation, it will read the size of next position and check if it exceed the total size of the file to recognize bad blocks. If a bad block is recognized, it will skip the bad block content and create a bad block object HoodieCorruptBlock. Skipping he bad block means to retrieve the MAGIC (#HUDI#) after the current block total size position, read 1 megabyte each time until the next MAGIC separation. After finding the MAGIC, jump to the current MAGIC position, and then you can continue to read complete block information.

The position of the bad block mentioned above is retrieved after the block total size position. If the file only contains MAGIC content without task content, it will lead to error. At this time, the HoodieLogFileReader will write MAGIC continuously, and consider MAGIC the Block Total size, and the next normal block will be skipped during the retrieval process and returned as a bad block. This part of the block data is lost during the merging process. In this situation, the skipped bad block is retrieved after the MAGIC instead of after the block total size, and the normal block will not be skipped when continuous MAGIC occurs during reading.

This problem was discovered when we do random online kill flink task. The data will not be lost by repairing and merging data in this way.

hj2016 avatar Apr 15 '22 03:04 hj2016

@hj2016 : thanks for the elaborate explanation. the fix makes sense. only comment I have is, within isBlockCorrupted(), we return in 2 to 3 places incase of corrupt. so, may be we could avoid seeking within isBlockCorrupt incase of corrupt block and reset it from the callers side.

int blockSize = inputStream.readLong();

.
.
boolean isBlockCorrupt = isBlockCorrupted(blockSize);
if (isBlockCorrupt) {
  inputStream.seek(blockStartPos);
  return createCorruptBlock();
}

within isBlockCorrupted(blockSize): we should not do any seek incase of corrupt block.

let me know what do you think.

nsivabalan avatar Sep 20 '22 03:09 nsivabalan

Also, do you think you can write up a unit test to cover the scenario. We have some tests around corrupt blocks already here.

btw, @hj2016 : we are looking to land this by this weekend as we have code freeze coming up for 0.12.1. just fyi.

nsivabalan avatar Sep 20 '22 03:09 nsivabalan

have pushed out a commit by myself to address feedback. yet to see if we can cover the fix w/ a test.

nsivabalan avatar Sep 21 '22 22:09 nsivabalan

CI report:

  • b235e45d75236f396f42f512774ede313de4f487 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Sep 28 '22 11:09 hudi-bot