hudi-rs icon indicating copy to clipboard operation
hudi-rs copied to clipboard

Support CoW incremental query

Open xushiyan opened this issue 1 year ago • 1 comments

xushiyan avatar May 04 '24 06:05 xushiyan

From the official docs, there are two ways to implement incremental queries.

  1. Configuration passed by options, details Spark Incremental Query For Hudi-0.13.0
  2. Through the hudi_table_changes TVF, details Spark Incremental Query For Hudi-0.14.1

Which method do you suggest using?

gohalo avatar Sep 02 '24 10:09 gohalo

@xushiyan Hullo, I would like to work on this. The high level implementation would be to:

  • Use timeline to retrieve latest commit or specific commit to use as a checkpoint
  • Get metadata for commits
  • Query on the changed data from that last check point. If there is any more specifics let me know!

jonathanc-n avatar Nov 30 '24 04:11 jonathanc-n

thank you both for the interest! we will do table api support first for incremental query, and then move on to sql support using datafusion. i'll lay out some groundwork first before splitting more follow up tasks.

xushiyan avatar Nov 30 '24 20:11 xushiyan

@xushiyan Is there anything I can help with for oroviding table api support?

jonathanc-n avatar Dec 02 '24 01:12 jonathanc-n