hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] No results are returned from incremental queries within the archived range

Open 1032851561 opened this issue 2 years ago • 2 comments

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

When using flink SQL to incrementally query the mor table within the archived range (read.start-commit ,read.end-commit), no result returned and "No new instant found for the table under path xxx" is printed in the log

To Reproduce

  1. create mytable with below options:
'read.start-commit' = '20220721150000'
'read.end-commit' = '20220721151500'
'read.streaming.enabled' = 'true'

Instants between 20220721150000 and 20220721151500 have been archived in the .hoodie/archive dir.

  1. select table and print
tEnv.sqlQuery("select * from mytable").execute().print();

Expected behavior

org.apache.hudi.source.IncrementalInputSplits#inputSplits directly returns empty if no active instants in query range. In this case, why not merge archived instants before return?

image

Environment Description

  • Hudi version : 0.11.0

  • flink version : 1.13.6

1032851561 avatar Jul 21 '22 09:07 1032851561

@danny0405 could you please take a look or re-assign it.

rmahindra123 avatar Jul 21 '22 19:07 rmahindra123

In this case, why not merge archived instants before return?

@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case

xushiyan avatar Aug 10 '22 14:08 xushiyan

yes, @xushiyan is right. Let us know if you are looking for any more assistance. If not, can you please close this github issue.

nsivabalan avatar Aug 16 '22 07:08 nsivabalan

Supported in https://github.com/apache/hudi/pull/6096, feel free to re-open it when you still have some questions.

danny0405 avatar Aug 16 '22 08:08 danny0405

In this case, why not merge archived instants before return?

@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case

Yes, we should optimize archive to achieve our goal

1032851561 avatar Aug 17 '22 03:08 1032851561