akka-persistence-cassandra icon indicating copy to clipboard operation
akka-persistence-cassandra copied to clipboard

Maintain a time index to support an akka read journal

Open jypma opened this issue 8 years ago • 7 comments

This adds an index table to cassandra, so events can be queried "roughly" by time. The akka journal query plugin implementation is in a separate library.

The way this works, is for every time window (say, 1 minute) to add a persistenceId to the index table once, if it's changed in that time window. Index size will be somewhat limited by only indexing the first change to a persistenceId during a time window.

The query API can then find what changed when, up to the accuracy of a time window. This allows remote / distributed views to resume into the event stream, without having to re-start from 0.

There are working integration tests in the query implementation repository.

jypma avatar Oct 15 '15 12:10 jypma

@jypma thanks for your pull request. I'll review and comment in the next couple of days. Cheers, Martin

krasserm avatar Oct 15 '15 15:10 krasserm

Thanks a lot. I think the initial feedback that might be covered could include:

  • Is this feature general enough to warrant the index table always being created?
  • Is the configuration mechanism to specify what to index generic enough?
  • Is there a smarter table structure to store this lookup information in cassandra?

jypma avatar Oct 15 '15 15:10 jypma

@jypma we designed writes to Cassandra in a way that they always go to a single partition in order to avoid issues discussed in #48. With your addition, writes may again go to different partitions, resulting in a logged batch which suffers from the problems described in #48.

We are currently discussing a general architecture for creating indices and supporting akka-persistence-query in #77 (/cc @zapletal-martin). The index is created asynchronously so that writing additional tables is not on the fast write path of akka-persistence. It would be great to additionally implement a time index based on this architecture. WDYT?

krasserm avatar Oct 18 '15 09:10 krasserm

Just commented over there. I fully agree this should go in the same direction. However, our project timelines might require us to go on with this forked branch for the moment. I'll at least add some test cases for the query side seeing index values and main table values out of order.

Secondly, I'll play with the idea to have the time/window be extracted from the main event (as an offline indexer would have to do). That would make it easier to upgrade/transition later.

jypma avatar Oct 19 '15 08:10 jypma

@jypma I fully understand that waiting for #77 to be ready is in conflict with your project timelines. I'm willing to merge your contribution as a temporary solution to support your query plugin but it needs modification so that writes go to a single partition. Can you imagine creating the time index with a background indexer running concurrently to the journal actor? Or do you plan to continue with the current implementation on your fork?

krasserm avatar Oct 20 '15 03:10 krasserm

@krasserm I understand your concerns. I've changed the PR itself to at least not touch the main events table, and derive an event's timestamp from the event itself (which would be needed anyways to allow async, replayable indexing). This way, upgrading should be a little easier.

I'm undecided whether I'd go on to make the actual indexing async at this point. I expect I'd run into some of the same challenges that https://github.com/krasserm/akka-persistence-cassandra/issues/77 tries to address. Plus, our particular application is somewhat latency-sensitive (time from an event being emitted to any real-time source picking it up shouldn't be more than a second or so).

Let's just keep the PR open for now, for reference, and come back to it when an async indexer is in play.

By the way, since akka 2.4 targets Java 8+, can this plugin as well? I prefer to use java.time.Instant over the type-less Long.

jypma avatar Oct 20 '15 10:10 jypma

@jypma ok, let's keep the PR open for now. Thanks anyway for your contribution. Regarding Java version, we should of course also target to 8+.

krasserm avatar Oct 21 '15 08:10 krasserm