datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Start setting up new StreamTable config

Open matthewmturner opened this issue 1 year ago • 3 comments

Which issue does this PR close?

Closes #10599

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Not yet, but I will add tests if there is agreement the API is going in the right direction

Are there any user-facing changes?

Yes, new interface for StreamTable

matthewmturner avatar May 21 '24 14:05 matthewmturner

@metesynnada @mustafasrepo i believe you were both involved in the StreamTable implementation so im interested in getting your views if this is going in the right direction towards a more generic StreamTable. (theres plenty of cleanup to do still but the general API is there)

matthewmturner avatar May 21 '24 14:05 matthewmturner

@alamb i would be happy to add example - for this PR it would likely just be copying from https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fifo.rs. I will get to that shortly.

Ultimately the motivation here is to start working towards a more generic interface where StreamTable can be used for multiple stream types as opposed to just files (such as websockets, kafka, etc. or maybe thats a non-goal, in which case i can just close this). Ive seen that it can require lots of custom code (formats and providers) to implement streaming tables and im hoping to simplify that.

matthewmturner avatar May 21 '24 21:05 matthewmturner

Im hoping to get to a similar API as ListingTable.

ListingTable => ListingTableConfig => FileFormat

Where ListingTable and ListingTableConfig are provided by datafusion and if you want to extend then you can just implement a FileFormat and benefit from the ListingTable and ListingTableConfig machinery.

In the case of streams i had in mind:

StreamTable => StreamTableConfig => StreamProvider (maybe i rename to StreamFormat for consistency?)

Where there are different StreamProvider like websocket, kafka, pcap, etc etc etc...

matthewmturner avatar May 22 '24 01:05 matthewmturner

@alamb i have a working example now. i have idea to update it to show more of the streaming nature (i.e. write to the fifo and get batches multiple times) but wont have time for that today. Do you have thoughts in general on whether this type of API could be supported?

matthewmturner avatar May 24 '24 12:05 matthewmturner

@berkaysynnada PTAL

ozankabak avatar May 29 '24 11:05 ozankabak

I will review it in detail tomorrow

berkaysynnada avatar May 29 '24 15:05 berkaysynnada

thank you @alamb @berkaysynnada @ozankabak

matthewmturner avatar Jun 04 '24 22:06 matthewmturner

Thanks again everyone

alamb avatar Jun 06 '24 12:06 alamb