iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Support structured streaming read for Iceberg

Open XuQianJin-Stars opened this issue 4 years ago • 9 comments

An implementation of Spark Structured Streaming Read, to track the current processed files of Iceberg table, This PR is a split of the PR-796 of Structured streaming read for Iceberg.

XuQianJin-Stars avatar Feb 25 '21 05:02 XuQianJin-Stars

Hi @XuQianJin-Stars - is there anything pending in this PR. Pl. let me know if you need any help to push this. Happy to collaborate & contribute.

SreeramGarlapati avatar Apr 16 '21 04:04 SreeramGarlapati

Hi @XuQianJin-Stars - is there anything pending in this PR. Pl. let me know if you need any help to push this. Happy to collaborate & contribute.

Yes, thank you very much, this function is already available in our internal work, and I want to improve this function in the community.

XuQianJin-Stars avatar Apr 18 '21 09:04 XuQianJin-Stars

@XuQianJin-Stars - this is great. Is there anything pending in this PR? Or are you waiting on any inputs? Thanks a lot for your contribution.

@rdblue @RussellSpitzer @aokolnychyi - can you folks pl. add your review/inputs. We are in need of this change - truly appreciate your help.

SreeramGarlapati avatar Apr 19 '21 21:04 SreeramGarlapati

I really appreciate the time given to the test case :)

holdenk avatar May 11 '21 19:05 holdenk

hi @holdenk Thank you very much for your review, I will reply to your comments later.

XuQianJin-Stars avatar May 12 '21 02:05 XuQianJin-Stars

hi @SreeramGarlapati @holdenk @jackye1995 @rdblue @RussellSpitzer Sorry, it took so long to fix the problem, do you have time to help continue to review this pr?

XuQianJin-Stars avatar Jul 15 '21 13:07 XuQianJin-Stars

Hey folks (incl. @XuQianJin-Stars & @RussellSpitzer ) -- is this something that people are still open to working on? We're running into a sitaution with the current limited streaming support where the lack of maxFilesPerTrigger (or it's equivelent) which is included in this PR keeps us from being able to do combined historical + streaming reads from Iceberg tables.

holdenk avatar Sep 13 '22 17:09 holdenk

& @flyrain - what are your thoughts?

holdenk avatar Sep 14 '22 17:09 holdenk

is this something that people are still open to working on?

+1, I have a PR out for supporting rate limiting in Spark 3 :

  • https://github.com/apache/iceberg/issues/2789
  • https://github.com/apache/iceberg/pull/4479

cc @holdenk

singhpk234 avatar Sep 14 '22 18:09 singhpk234