pranadb icon indicating copy to clipboard operation
pranadb copied to clipboard

Generation of watermarks

Open purplefox opened this issue 3 years ago • 0 comments

Watermarks are inserted into the event stream at ingestion from source.

A watermark contains a time, and says “there will be no more events with an event time before this time”, this allows relation operators in the stream to emit results.

The watermark for a stream can be generated in different ways, eg.

  1. Based on event time of stream elements. Here we might allow for some degree of known latenesss. So we take the max event time seen so far and subtract a fixed amount.
  2. Based on ingestion time. We might want to generate watermark based on difference between event time and ingestion time.

When are watermarks generated?

  1. punctuated. For every event that arrives from a source we compute the watermark and if it has advanced we generate the watermark to the stream. The watermark won’t advance for each element if elements are out of order for example.
  2. Periodic- generate a watermark every x ms.

Watermarks by themselves don’t drop late events!!! It’s possible they may still arrive. So need a strategy for dealing with late events in an operator - e.g. drop the event.

purplefox avatar Sep 30 '21 16:09 purplefox