node-eventstore icon indicating copy to clipboard operation
node-eventstore copied to clipboard

Redis indexation

Open rehia opened this issue 8 years ago • 9 comments

In #94, I was talking about some ways to improve redis implementation performance, mostly when there are a large number of events (or snapshots). The problem with actual the implementation is the scan which usually need to scan every events (or snapshots) of an aggregate, and then make the necessary calculations to only take needed events.

I have made a quick test with my production data. I have an aggregate with +46K events with a total amount of +117K I tried to scan all the keys (only scanning, no get or mget) using 2 strategies:

  • I made a common scan on all the keys. It took 6181ms.
  • I put all the aggregate keys in a sorted set (members score was the event version), and made a zrange on all the keys. It took 415ms

I think that sorted sets could be a good option to index aggregates keys, at least for 2 reasons:

  • It seems more efficient to scan all the keys at once, for an aggregate
  • It will be more efficient to get keys between 2 revisions, for an aggregate (like you need to do when you reload state from a snapshot and an event)

This means that the database need to be indexed at first, and the index maintained. So whenever a client connects, we need to check if the store has been indexed, and if it's not the case, scan all the aggregates, and scan for their events and snapshots keys, and finally index the keys (events and snapshots separated obviously). Once done, each time an event is added, its key should be added to the corresponding sorted set.

What do you think about this solution? I'll try to work on an implementation. But if you have any remarks, objections or suggestions, don't hesitate!

rehia avatar Jan 28 '17 23:01 rehia

Sounds like a breaking change for the redis implementation... (have to write this at least in the release notes) As far as I know your production data is the biggest for redis... so you're the expert ;-)

adrai avatar Jan 29 '17 07:01 adrai

I managed to make something work, but I need to do more tests. I was also wondering if I should modify the existing redis implementation, or if I should apply Open Closed Principle and extend the existing implementation, and add a new database type (called indexedRedis ?), compatible with the existing implementation. What do you think about it ?

rehia avatar Feb 10 '17 21:02 rehia

I think it's ok to replace the current implementation when we can proof the new one to be better in form of performance etc...

adrai avatar Feb 10 '17 21:02 adrai

Sure, but the store needs to be indexed, before any usage. It is done on init. And it can take quite a long time (I've indexed my 122K+ events store in 170 sec on my Mac Book Pro 😅, because of the full scan made on all events), and might be done outside of the application (using a dedicated node task), depending on the size of the db. So I was thinking about the best option not to break compatibility (or at least, here, not to cause an overhead when the application starts the first time with this option). It could be done using a dedicated option, or using another implementation.

rehia avatar Feb 10 '17 22:02 rehia

But is the already stored data compatible with the new implementation? So really only this indexing task is what it's needed? So perhaps the indexing in the init is ok... but we can offer a script to execute to do this indexing on an other process?

adrai avatar Feb 11 '17 07:02 adrai

Yes sure, it's fully compatible. The events are not updated. Only some keys are added to better index events and snapshots. But once you switch to an indexed redis store, the indexation is done. You can go back to a not indexed redis store, but you will have to do it manually (by removing a dedicated key).

What I can do, when you want to use an indexed redis store, is:

  • checking if the database is already indexed (using a dedicated key). if not, you will have to do it manually using a call to an index() method on the store (maybe in a script).
  • if you don't do the indexation, the common scan() method will be used, using SCAN redis command
  • once the indexation is done, the scan() method will use indexes

Why not offering a script for this, but I was wondering how to discover all the options needed to connect to the redis instance. Maybe giving a common example in the documentation would be a first step forward.

You're OK with that?

rehia avatar Feb 11 '17 16:02 rehia

Yes, let's try :-)

Il giorno 11-feb-2017, alle ore 17:16, J?r?me Avoustin <[email protected]mailto:[email protected]> ha scritto:

Yes sure, it's fully compatible. The events are not updated. Only some keys are added to better index events and snapshots. But once you switch to an indexed redis store, the indexation is done. You can go back to a not indexed redis store, but you will have to do it manually (by removing a dedicated key).

What I can do, when you want to use an indexed redis store, is:

  • checking if the database is already indexed (using a dedicated key). if not, you will have to do it manually using a call to an index() method on the store (maybe in a script).
  • if you don't do the indexation, the common scan() method will be used, using SCAN redis command
  • once the indexation is done, the scan() method will use indexes

Why not offering a script for this, but I was wondering how to discover all the options needed to connect to the redis instance. Maybe giving a common example in the documentation would be a first step forward.

You're OK with that?

You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/adrai/node-eventstore/issues/102#issuecomment-279157201, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABCS8rR1gGd40Bis1LZ5XpTEv9uzNq4rks5rbeFAgaJpZM4Lwpi2.

adrai avatar Feb 11 '17 17:02 adrai

@rehia do you still have interest in this, or plans to execute on it?

We are sitting with around 30k events in redis, and noticing it slowing down as well. I think this solution (indexing on aggregateId and using sorted sets) would be awesome!

If you don't have plans to continue it or work on it, would you be willing to share a WIP branch with your implementation? If/when I find some time, I could pick up where you left off, and carry this forward.

TyGuy avatar Sep 05 '18 18:09 TyGuy

@TyGuy hi!

Sorry, but I couldn't take some time to start something on this. You can do it, if you have time for it.

The solution I suggested could be a good one, but as usual, it must first be validated on the field.

I hop you'll find more time than me 😅 Good luck !

rehia avatar Sep 05 '18 19:09 rehia