frameless icon indicating copy to clipboard operation
frameless copied to clipboard

Type Spark’s Structured Streaming

Open OlivierBlanvillain opened this issue 6 years ago • 2 comments

We are currently missing these two Dataset method:

  • DataStreamWriter writeStream()
  • Dataset withWatermark(String eventTime, String delayThreshold)

That require some understanding of Spark streaming to be properly typed and tested. Here is the relevant documentation if anyone is interested and getting started on that:

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html https://databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html

OlivierBlanvillain avatar Jan 20 '18 14:01 OlivierBlanvillain

+1 - This was a big blocker for us adopting Frameless, as most of our jobs are structured streaming jobs.

etspaceman avatar Jul 30 '19 15:07 etspaceman

I'm curious why this never took off, my guess is that most typelevel people are using fs2 instead of spark streaming, but its still limited in that it can't out of the box do distributed streaming. Maybe typelevel people are using flink instead but seems doubtful from how flink is engineered.

This article is interesting, has anyone tried to extend this approach into the fs2/frameless world?

http://mandubian.com/2014/02/13/zpark/

kyprifog avatar Sep 18 '19 19:09 kyprifog