flink-siddhi
flink-siddhi copied to clipboard
Dynamic Logical Partitioning on event and control stream
Hi @haoch
How the operator state behaves in parallelism?
As I can see in the class org.apache.flink.streaming.siddhi.operator.AbstractSiddhiOperator
we are using the managedOperatorState.
Is there a way to partition the rules and events based on some key and use the keyedManagedState? Just curious about the parallelism and performance. Do we have any performance measures?
For example: I have the siddhi query which expects 3 Failure events followed by 1 successful event like this - from every e1=firewallStream[name == 'FAILURE']<3> -> e2=firewallStream [ name == 'SUCCESS'] within 1 min select 'AAAAAAAA-AAAA-AAAA-BBBB-AAAAAAAAAAAA' as ruleId, e1.externalId as eventIds insert into outputStream
If I set the Siddhi-CEP operator parallelism to 1. The above query generates the correct result as all the events go to one operator instance. But If I increase the parallelism to the operator more than 1. then the events are distributed to more instance of the operator. The query is failed to generate the result as the state is not shared between the parallel instance of the operator.
Have you tried using siddhi partitions in anyway... because siddhi has the concept of partitions and may be that can be applied here. https://siddhi.io/en/v5.1/docs/query-guide/#partition
Or Flink based data partitioning which diverts data based on the partitioning scheme to respective operator task if parallelism is enable for an operator.