hudi [SUPPORT] No results are returned from incremental queries within the archived range

[SUPPORT] No results are returned from incremental queries within the archived range

Open 1032851561 opened this issue 2 years ago • 2 comments

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

When using flink SQL to incrementally query the mor table within the archived range (read.start-commit ,read.end-commit), no result returned and "No new instant found for the table under path xxx" is printed in the log

To Reproduce

create mytable with below options:

'read.start-commit' = '20220721150000'
'read.end-commit' = '20220721151500'
'read.streaming.enabled' = 'true'

Instants between 20220721150000 and 20220721151500 have been archived in the .hoodie/archive dir.

select table and print

tEnv.sqlQuery("select * from mytable").execute().print();

Expected behavior

org.apache.hudi.source.IncrementalInputSplits#inputSplits directly returns empty if no active instants in query range. In this case, why not merge archived instants before return?

Environment Description

Hudi version : 0.11.0
flink version : 1.13.6

Jul 21 '22 09:07 1032851561

@danny0405 could you please take a look or re-assign it.

Jul 21 '22 19:07 rmahindra123

In this case, why not merge archived instants before return?

@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case

Aug 10 '22 14:08 xushiyan

yes, @xushiyan is right. Let us know if you are looking for any more assistance. If not, can you please close this github issue.

Aug 16 '22 07:08 nsivabalan

Supported in https://github.com/apache/hudi/pull/6096, feel free to re-open it when you still have some questions.

Aug 16 '22 08:08 danny0405

In this case, why not merge archived instants before return?

@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case

Yes, we should optimize archive to achieve our goal

Aug 17 '22 03:08 1032851561

hudi hudi copied to clipboard

[SUPPORT] No results are returned from incremental queries within the archived range

hudi
hudi copied to clipboard