ksql icon indicating copy to clipboard operation
ksql copied to clipboard

Reorder messages by timestamp

Open ybyzek opened this issue 3 years ago • 3 comments

Is your feature request related to a problem? Please describe.

Use case 1: Receiving stock market data from an external source, however, the source sometimes delivers the market data out of order. Need to re-order the records by the timestamp within a certain time window (say 30 second) in ksqlDB, so that the downstream topics will have the results back in the right order

Use case 2: Game provider clients can be offline and accumulate messages, then when they come online the messages are (sometimes) delivered. Need to re-order the messages for proper processing.

Describe the solution you'd like

Built-in function that re-orders records within a given window.

Describe alternatives you've considered

Kafka Streams example: https://github.com/confluentinc/kafka-streams-examples/pull/411

Additional context Add any other context or screenshots about the feature request here.

ybyzek avatar Dec 02 '21 18:12 ybyzek

At a high-level, this seems related to ORDER BY (https://github.com/confluentinc/ksql/issues/1572)

ybyzek avatar Dec 07 '21 15:12 ybyzek

Just to dump a view thoughts:

  • we should look into SQL OVER clause (maybe we could leverage it for this case?)
  • if OVER clause does not fit, we might need to consider adding a new operator (or maybe allow use a SLIDING WINDOW [that we didn't add yet] without a GROUP BY clause) -- Not sure if we should reuse ORDER BY as keyword or not
  • it might also be possible to just add a completely new operator reorder(stream, grace) (ie, a table-value function) that does the reordering
  • adding an "re-order" operator to Kafka Streams might be beneficial for KS users as well (as an alternative, we could use a custom transfromValues to implement it

mjsax avatar Dec 07 '21 17:12 mjsax

It's a problem that we are currently facing too. Eg the Salesforce source KC connector produces messages without a key. If you use a topic with multiple partitions to store those messages, they will end up in random partitions and you'll possibly process them out of order.

gphilipp avatar Aug 02 '22 09:08 gphilipp