aecor icon indicating copy to clipboard operation
aecor copied to clipboard

Provide a kafka streams/event stores based event log

Open SemanticBeeng opened this issue 7 years ago • 6 comments

Feature request as per

https://blog.softwaremill.com/event-sourcing-using-kafka-53dfd72ad45d

"Hence, we need a better way. That’s where kafka-streams and state stores come into play. Kafka-streams applications run across a cluster of nodes, which jointly consume some topics."

This way, the concepts of "entity" from akka persistence and "runtime state" from event stores complement each other. Ideally akka persistence runtime would not be necessary to run aecor.

SemanticBeeng avatar May 24 '18 18:05 SemanticBeeng

@SemanticBeeng that's the topic I thought a lot about. When you use Kafka as an event log you loose the ability to query events by entity key. Imagine the time you need to recover the whole system in case you need to change how you fold an entity state? Also, runtime state that uses KTable or similar is eventually consistent which is acceptable for read side, but not for write side.

notxcain avatar Oct 02 '18 14:10 notxcain

Please help me understand better your points, especially "acceptable for read side, but not for write side.".

But for "ability to query events by entity key." we could use a separate topic per entity. And then there is KQL: https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html

Again, please explain a bit deeper to see what I am missing since have not thought as much as you.

SemanticBeeng avatar Oct 02 '18 15:10 SemanticBeeng

I think there are limitations to separate topic per entity over Kafka as (at least until the previous version) it would scale up to around ~10k topics. I think this has been increased the order of ~100k but still it's not the most efficient way of using Kafka. Being a durable topic Kafka is typically used by consuming the messages as a consumers group (multiple nodes connected to the brokers) and it's messages are guaranteed to be ordered within a partition, hence these are the dimensions you can play with in order to increase parallel processing. In this way, if you partition by entity ID you'd know you're consistently sending the messages for a given entity to the same host, which I think is what you want. In the Lagom world, if you have say a 3 host distributed cluster, what you can do is process all the messages as a consumer group and if you're not partitioning by key or you don't have a good partition key then they route the message via akka remoting to the actual cluster node that would host the persisted entity (not sure on the mechanism).

Apache Pulsar is also an interesting middleware to explore as a target, it supports both a topic/pub-sub interface as well as a queue interface (suitable for work scheduling for example) and it scales to millions of topics, therefore it's conceivable you could have a topic for each host in your akka cluster processing and therefore if you're doing something similar to an Ask pattern you might be able to implement a direct (and durable) return to sender reply without finishing to be routed via akka remoting (but then you need to figure out what happens if the host which Asked something dies).

Apologies if this explains something obvious.

I also didn't understand what was meant by "acceptable for read side, but not for write side", in Lagom Read Side Processor designates a role processes the Events (in the Command (input)-> State (hidden) -> Event (output) sense) and then prepares some sort of a view which is query-able or more optimised for some sort of read pattern. Are you talking about the problem arising of re-processing Commands which requires some sort of strategy to de-dupe the already processed ones?

schrepfler avatar Dec 05 '18 03:12 schrepfler

@schrepfler what I mean is that you can’t use eventually consistent KTable to validate commands against.

notxcain avatar Dec 05 '18 14:12 notxcain

From latest post of Vladimir : https://pavkin.ru/aecor-part-4a/

Highlight of similarity between the event journal in context of event sourcing and kafka "log".

image

And mention to kafka-journal. Not sure how much from this could be reused for the purpose of this issue.

SemanticBeeng avatar Jan 03 '19 08:01 SemanticBeeng

There is a promising Akka Persistence journal implementation based on both Cassandra and Kafka. You may want to take a look at the trade-offs made.

notxcain avatar Jun 06 '19 12:06 notxcain