hudi
hudi copied to clipboard
[SUPPORT] No results are returned from incremental queries within the archived range
Tips before filing an issue
-
Have you gone through our FAQs?
-
Join the mailing list to engage in conversations and get faster support at [email protected].
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
When using flink SQL to incrementally query the mor table within the archived range (read.start-commit ,read.end-commit), no result returned and "No new instant found for the table under path xxx" is printed in the log
To Reproduce
- create mytable with below options:
'read.start-commit' = '20220721150000'
'read.end-commit' = '20220721151500'
'read.streaming.enabled' = 'true'
Instants between 20220721150000 and 20220721151500 have been archived in the .hoodie/archive dir.
- select table and print
tEnv.sqlQuery("select * from mytable").execute().print();
Expected behavior
org.apache.hudi.source.IncrementalInputSplits#inputSplits
directly returns empty if no active instants in query range. In this case, why not merge archived instants before return?
Environment Description
-
Hudi version : 0.11.0
-
flink version : 1.13.6
@danny0405 could you please take a look or re-assign it.
In this case, why not merge archived instants before return?
@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case
yes, @xushiyan is right. Let us know if you are looking for any more assistance. If not, can you please close this github issue.
Supported in https://github.com/apache/hudi/pull/6096, feel free to re-open it when you still have some questions.
In this case, why not merge archived instants before return?
@1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration is we don't want to spend extra computation power to deserialize archived commits and find the incrementally changed files. You can configure archival so that it retains active commits longer for your use case
Yes, we should optimize archive to achieve our goal