quix-streams
quix-streams copied to clipboard
Tumbling time-based window triggered by time, not by new incoming messages
Is your feature request related to a problem? Please describe. Currently, with a time-based window, events are either grouped by key or partition during the period determined at the window's creation. The tumbling of that window occurs when a new event arrives and falls into a new window, according to partition/key rules.
Describe the solution you'd like The time-based window should tumble independently of new messages, as messages with the same key or within the same partition may not arrive for a long time. The window should tumble solely based on the elapsed time it covers.
Solution: The tumbling_window function (and all related functions) on a StreamingDataFrame should accept an additional parameter to control its behavior—whether it tumbles at regular intervals or only when new events arrive.
Describe alternatives you've considered Currently, I have implemented a memory/database-based solution for resilience to restarts, where one process collects new messages and another process validates the accumulated events within the window's time frame.
Another (less ideal / dirty) workaround is to post dummy messages to the topic at regular intervals to ensure the window tumbles, discarding these messages based on their content.
Additional context Use case for this feature: grouping messages to start batch processing for all accumulated messages at once. Once the time frame has elapsed, the batch must be triggered, as there may be no further events for a while. Waiting for the next message could introduce unnecessary idle time, impacting performance.
Hi @NicholasJallan , thanks for creating the issue and explaining the use case in detail 👍 .
Due to how the whole framework is implemented, windows cannot currently be rolled based solely on wall-clock time. It is reactive by nature, and to trigger any computation, the system needs to receive a message. Therefore, I can't recommend any other workarounds except those you've posted, like sending dummy messages.
I'll keep the issue open so it's on our radar.
Hi Daniil,
Thanks for your reply. I understand your response and it was something i expected regarding the implementation.
IMHO, this external code that i'm going to implement for those dummy auto discarded message is a solution that would diserve be included / wrapped / hidden inside the StreamingDataFrame class, as it might be a source of duplicated code amongt many projects.
Until then, I'll implement and use this workaround (publishing a dummy message each period). I'll try to publish here the implemented solution if it can help others.
Sure, feel free to share or contribute what you have 👍