iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Implement Initial Spark Structured Streaming Source

Open kbendick opened this issue 3 years ago • 6 comments

We currently cannot use Spark Structured Streaming for reading from an Iceberg table. This seems like a very common need.

We were previously discussing the issue quite a lot back in February, but discussion on the topic has slowed down quite a lot.

Some of the code that was discussed has been merged in and I'd like to revisit this.

The original PR which has become inactive: https://github.com/apache/iceberg/pull/796

My first PR which fixes some documentation in the MicroBatch builder class and adds some tests to start ensuring that the functionality which has been merged in is working and to start sussing out corner cases. https://github.com/apache/iceberg/pull/1627

I will work off of the conversation that has been in https://github.com/apache/iceberg/pull/796 as well as what I can find in Slack and then I hoped we could revisit issues as they arise. But a decent amount of the previously proposed code has been merged in, so I'd like to take a stab at piecing it together from the discussion that was previously had as well as changes I think that will be needed to support various scenarios, like deletes, different triggers, the global watermark, the per stream watermark that's declared etc.

kbendick avatar Oct 19 '20 06:10 kbendick