ksql
ksql copied to clipboard
Reorder messages by timestamp
Is your feature request related to a problem? Please describe.
Use case 1: Receiving stock market data from an external source, however, the source sometimes delivers the market data out of order. Need to re-order the records by the timestamp within a certain time window (say 30 second) in ksqlDB, so that the downstream topics will have the results back in the right order
Use case 2: Game provider clients can be offline and accumulate messages, then when they come online the messages are (sometimes) delivered. Need to re-order the messages for proper processing.
Describe the solution you'd like
Built-in function that re-orders records within a given window.
Describe alternatives you've considered
Kafka Streams example: https://github.com/confluentinc/kafka-streams-examples/pull/411
Additional context Add any other context or screenshots about the feature request here.
At a high-level, this seems related to ORDER BY
(https://github.com/confluentinc/ksql/issues/1572)
Just to dump a view thoughts:
- we should look into SQL
OVER
clause (maybe we could leverage it for this case?) - if
OVER
clause does not fit, we might need to consider adding a new operator (or maybe allow use aSLIDING WINDOW
[that we didn't add yet] without a GROUP BY clause) -- Not sure if we should reuseORDER BY
as keyword or not - it might also be possible to just add a completely new operator
reorder(stream, grace)
(ie, a table-value function) that does the reordering - adding an "re-order" operator to Kafka Streams might be beneficial for KS users as well (as an alternative, we could use a custom
transfromValues
to implement it
It's a problem that we are currently facing too. Eg the Salesforce source KC connector produces messages without a key. If you use a topic with multiple partitions to store those messages, they will end up in random partitions and you'll possibly process them out of order.