hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-9482] Fix the issue where the file slice was wrongly filtered out

Open TheR1sing3un opened this issue 6 months ago • 5 comments
trafficstars

Please consider following case:


  /**
   * case: query with pending instant
   *
   * 1. insert-txn1 start, will create a log file
   * 2. insert-txn2 start, will create a log file
   * 3. insert-txn2 commit
   * 4. query-txn3 execute
   * 5. insert-txn1 commit
   *
   *  |-------------------------------- txn1: insert --------------------------------------|
   *               |------ txn2: insert ------|
   *                                            |---txn3: query---|
   *
   *  we expected txn3's query should see the data of txn2
   */

However, I found that the reading result here was empty and the data written by txn2 was not read as I expected.

The reason lies in that when we were constructing the file slice at the current moment, we selected the smallest instant as the base instant of this file slice. In the above case, that is, txn1 was used as the base instant. image image When we try to query, we will perform AbstractTableFileSystemView#getLatestMergedFileSlicesBeforeOrOn to fetch the file slice to read. image

image image image

Eventually, we will go to the above-mentioned judgment logic to determine whether a slice is committed. The logic here is to determine whether there is the base instant of this slice on the timeline. However, according to the above logic, the base instant of this slice is txn1. This instant does not exist on the timeline because the timeline here is composed of the completed writes and the pending compaction. Naturally, it does not include the txn1 that is still pending. This led to the entire slice being filtered out. However, we hope that the log files written by txn1 can be filtered and the log files written by txn2 can be read.

Change Logs

  1. Fix the issue where the file slice was wrongly filtered out when there were committed and pending logs

Describe context and summary for this change. Highlight if any code was copied.

Impact

bug fix Describe any public API or user-facing feature change or any performance impact.

Risk level (write none, low medium or high below)

mid If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

none Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

  • [x] Read through contributor's guide
  • [x] Change Logs and Impact were stated clearly
  • [x] Adequate tests were added if applicable
  • [x] CI passed

TheR1sing3un avatar May 22 '25 07:05 TheR1sing3un

The essence of this problem is: How can we confirm for a file group that only has log files what the base instant of its slice is? The current logic is to directly take the instant with the smallest character order as the base instant. But in this way, the problems I mentioned above will arise. I think we can pretend that each file group initially has an HoodieActiveTimeline.INIT_INSTANT_TS version, but this version has no actual data written. In this way, we don't have to consider how to handle slices that are all logs.

My current code changes are based on the above logic. I'm not sure if there are any other better methods. Looking forward to everyone's thoughts

TheR1sing3un avatar May 22 '25 11:05 TheR1sing3un

@hudi-bot run azure

TheR1sing3un avatar May 26 '25 13:05 TheR1sing3un

@hudi-bot run azure

TheR1sing3un avatar May 27 '25 12:05 TheR1sing3un

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar May 27 '25 14:05 hudi-bot

@danny0405 Hi, Danny, this is a bug in data quality that has already occurred in our production environment. Let's solve this problem together!

TheR1sing3un avatar Jun 03 '25 02:06 TheR1sing3un

@danny0405 Hi, Danny, any ideas?

TheR1sing3un avatar Jun 27 '25 04:06 TheR1sing3un