Support structured streaming read for Iceberg
An implementation of Spark Structured Streaming Read, to track the current processed files of Iceberg table, This PR is a split of the PR-796 of Structured streaming read for Iceberg.
Hi @XuQianJin-Stars - is there anything pending in this PR. Pl. let me know if you need any help to push this. Happy to collaborate & contribute.
Hi @XuQianJin-Stars - is there anything pending in this PR. Pl. let me know if you need any help to push this. Happy to collaborate & contribute.
Yes, thank you very much, this function is already available in our internal work, and I want to improve this function in the community.
@XuQianJin-Stars - this is great. Is there anything pending in this PR? Or are you waiting on any inputs? Thanks a lot for your contribution.
@rdblue @RussellSpitzer @aokolnychyi - can you folks pl. add your review/inputs. We are in need of this change - truly appreciate your help.
I really appreciate the time given to the test case :)
hi @holdenk Thank you very much for your review, I will reply to your comments later.
hi @SreeramGarlapati @holdenk @jackye1995 @rdblue @RussellSpitzer Sorry, it took so long to fix the problem, do you have time to help continue to review this pr?
Hey folks (incl. @XuQianJin-Stars & @RussellSpitzer ) -- is this something that people are still open to working on? We're running into a sitaution with the current limited streaming support where the lack of maxFilesPerTrigger (or it's equivelent) which is included in this PR keeps us from being able to do combined historical + streaming reads from Iceberg tables.
& @flyrain - what are your thoughts?
is this something that people are still open to working on?
+1, I have a PR out for supporting rate limiting in Spark 3 :
- https://github.com/apache/iceberg/issues/2789
- https://github.com/apache/iceberg/pull/4479
cc @holdenk