orleans How to invoke a grain only if active?

How to invoke a grain only if active?

Open shlomiw opened this issue 5 years ago • 34 comments

Hi,

I've ran into this situation many times before. I have grains that sometimes I need to update their state, but only if they're active.

These grains are heavy to activate since they're doing SQL queries in their OnActivateAsync() (I use my own state management). So I'd rather avoid their activation completely if not necessary.

Some real life examples:

Scenario 1

Main grains: Player, Game, Gang (gang of players).

A Player shared a Game in Gang, gang holds this share information in its state, including its Views Count.
Another Player viewed the Game, it increases its Views Count, and this should update all the Gangs which shared this game. Some of them might not be active. Activating the Gang is quite heavy. I'd like to avoid that.

Scenario 2

I need to online bulk update many Players state - 100k's of them, for a new business requriement or maintenance operation.

Simplest solution would be to just invoke their grains and update. But very very heavy, both on DB and everything.
Script directly to the DB then sync only the active players (only few 1000's). Note that I'm not afraid of race-conditions in this case, but this is for another discussion, because sometimes I do afraid, then I need some kind of lock mechanism when updating.

I well aware I can have other grains or any other app mechanism to keep track of which grain is alive, i.e. by using timers as @ashkan-saeedi-mazdeh suggested in gitter. I can also keep in the storage a flag if the grain is active. But then I'll have to keep timers, actually even reminders, for every grain to make it robust enough. This is very heavy stuff and cumbersome to write and maintain.

I know I can also check it via ManagementGrain, but it's a singleton grain and it would be a bottleneck to use it in frequent cases.

Note about scenario 1 - I've also thought about using Streams (classic pub/sub solution), but I don't want to keep a stream per each Game and unfortunately, in my experience, streams are just not reliable enough (a matter for yet another discussion). About scenario 2 - streams could have been a good fit. In addition in this rare case it could be ok to use the ManagementGrain.

So I wish the Orleans framework could expose a built-in mechanism for this purpose. I'm not sure about the internals. But when you are using the GrainFactory, to Get a grain, it probably checks against the directory if it's active and if not then it should activate it.. ? so in this process, can we expose just the first part? wether it's already active?

Many thanks if you read this far :)

Dec 28 '18 18:12 shlomiw

The Indexing project by Research is intended to provide a clean and scalable way to achieve what you are looking for as one of the supported features.

In the meantime, using a custom registration mechanism is the only scalable way to do that.

Dec 28 '18 19:12 sergeybykov

What for the cases that I'd like to get an access also to the non-active grains (I.e to clear their state from the backed store)? In the indexing world that would be something like: from myGrain in GrainClient.GrainFactory.GetActiveGrains<IMyGrain, MyGrainProperties>() where myGrain.LastTimeDidFoo < DateTime.AddDays(-60) select myGrain;

Can that be achieved today other than accessing the storage layer directly (which I'd prefer to keep flexible and not assume certain storage provider is used)? Perhaps I can move a way from storage provider, and use my own interface to read/write/update the state, then can use the same interface to do the query above (just checking if there are other ways to do so today)

Dec 28 '18 19:12 nir-schleyen

@shlomiw As @sergeybykov says, the (indexing project) does what you need. It runs on Orleans V2 and @TedHartMS is done replacing the inheritance-based API with dependency injection (not sure if he merged it in yet). It has undergone some testing, but not nearly enough for production use. Depending on the difficulty of alternative solutions you're considering, it might be easier to add sufficient tests for the indexing features you need than coding it from scratch.

@nir-schleyen The indexing project supports indexes over all grains, not just active ones. However, it only supports equality predicates (i.e., hash maps), not less-than (i.e., B-trees).

Dec 29 '18 19:12 philbe

@sergeybykov @philbe - thank you very much for pointing out on the indexing project, I'm keeping track on it since introduced. Didn't try it yet though. Hope it'll get production grade soon enough.

Since all my grains state are in SQL tables (spanner actually), then it's easy for me to 'query' them directly on the DB then I can perform actions on them. The only problem that I don't know if the grains are active.. (sometimes I'd prefer just updating the storage directly if the grain is not active).

So for me it seems a bit of overkill to use the indexing project just to check if a grain is active.

I'm not familiar with the internals, but how hard is it to check if a grain is active? it's done when invoking a grain to decide if and where to activate it, no? Wouldn't it be a reasonable feature, at the future, to have it as part of Orleans grains interface? maybe in IGrainFactory? i.e. GrainFactory.IsActive<TGrainInterface>(id).

Another question related to the above 'Scenario 2' - how would you prevent a grain from being activated? i.e. when updating its state outside of it. I can set a flag on the storage and then check it inside OnActivateAsync(), but it doesn't seems like a good design... and it even throws an exception when trying to deactivate a grain inside OnActiveAsync() (reasonable decision).

Thanks!

Jan 01 '19 20:01 shlomiw

I have to say that the approach of checking if a grain is currently activated or not goes against the very idea of virtual actors that are always available.

I'm not familiar with the internals, but how hard is it to check if a grain is active? it's done when invoking a grain to decide if and where to activate it, no? Wouldn't it be a reasonable feature, at the future, to have it as part of Orleans grains interface? maybe in IGrainFactory? i.e. GrainFactory.IsActive<TGrainInterface>(id).

There will always be races - you received a response that a given grain is not activated, and it gets activated a millisecond later, before you execute your action that assumes the grain isn't active. So what's good about IsActive() if you can never rely on the value it returns?

Note that it's subtly but IMO materially different from querying for all activated grains that satisfy a given criterion. In the latter case, I would argue, it is more clear that the returned list might be stale by the time it is used, and it is it will miss any newly activated grains and might include some that got deactivated since. A DB query always returns data that is potentially stale by the time it is received. I think it is less obvious that IsActive invoked on an object retuens an equally stale result.

The same rational goes for updating grain state in the DB directly. This can easily lead to inconsistencies between in-memory and persistent copies of the state.

I get it that sometimes you get to cut corners to build a practical solution that isn't conceptually pure. I'm just not comfortable supporting potentially trecherous patterns that lead to questions like:

how would you prevent a grain from being activated? i.e. when updating its state outside of it. I can set a flag on the storage and then check it inside OnActivateAsync(), but it doesn't seems like a good design... and it even throws an exception when trying to deactivate a grain inside OnActiveAsync() (reasonable decision).

I think it might me cleaner in this case to split the grain state into two parts - one can be quickly loaded one a heavier one that you would load only when absolutely necessary. That way you could perform those update operations via the grain without loading the expensive part of its state instead of updating the DB directly. We are moving toward this kind of storage model with the notion of facets that were first used in transactions, where each facet is a piece of grain state that can be loaded/updated independently from other facets and potentially even via a different storage API.

Jan 02 '19 17:01 sergeybykov

@sergeybykov - I genuinely appreciate your time for your thorough reply!

There will always be races - you received a response that a given grain is not activated, and it gets activated a millisecond later, before you execute your action that assumes the grain isn't active. So what's good about IsActive() if you can never rely on the value it returns?

You are correct, I tried to simplify it. My actual wish was to have something that could invoke the grain only if active. But anyway you look at it - there are races..

I think it might me cleaner in this case to split the grain state into two parts

I also thought about this solution which would ease the pain for sure.

As you can see I was struggeling with this issue quite a bit.

The problem is that I can have 100k's, or millions of non active players, and only 1000s active. So to be practicle, I need to update their state in batching process. I.e 10k per batch. I don't want to update each one independently, even if I have a lighter OnActiveAsync as you suggested, it's just not scalable enough. Can put heavy load on the system, and will take a long time. I just don't want to have roundtrip to the DB per player.

Of course, the easiest solution would be during a maintenance break, bring the whole system down, run a DB script, then restart the silos.. but I do want to get to an online process solution.

So my current thinking:

Have additional coordinator grains that are responsible on batch of players.
Very important aspect - make sure the update state logic is idempotent, I can retry it.
Use ManagementGrain to fetch who's online... I really don't like this part, but if there 1000s, or even 100k's active - it will be probably be ok.
Invoke the active players and tell them to update their state and let the coordinator grain know.
Ran batch db operation on the offline ones.
Check again whos active (via ManagementGrain) - and validate for the new activations that their state has been updated.
Monitor the process with reminders.

Once again, thanks for reading this far :)

While writing this I was thinking again about what you suggested. If I can load the player grains without DB query at all (only later if necessary), I can also invoke them with a method with all the info needed for the update state. And instead of writing to the DB with one roundtrip per player, use another grain to batch inserts together - that might be a scalable solution! The only problem is to refactor the grain everywhere with this lazy state load.. because it assumes everywhere that the state is there.

I love my job :)

Jan 02 '19 20:01 shlomiw

I also think that it would be a helpful feature. Besides orleans I also use orbit (java) for an IOT application. For each gateway and apartment there is an actor and it is very expensive to create them. We use a rule engine to analyze sensor measurements and it can take up to an second (90% CPU) to create it. But fortunately there is an annotation for that: https://github.com/orbit/orbit/wiki/Concepts%3A-Useful-Annotations

Jan 02 '19 21:01 SebastianStehle

I also think that it would be a helpful feature.

By feature do you mean an attribute on a method (or some other way) that specifies that a call to it will be sent to the target grain only if it is already activated? And if it's not, the call would blindly succeed?

Jan 02 '19 22:01 sergeybykov

And if it's not, the call would blindly succeed?

I think that the proposed feature should throw a strongly-typed exception if the target grain is not activated. Maybe [NoActivate] describes that behavior accurately enough. There would be a race condition (which is fine) between looking up an activation for the grain and the activation receiving and processing the call.

That could be combined with [OneWay] if the developer doesn't care about the result (which would also imply no throwing).

We could have an API to check if a grain has an activation registered in the directory, which could be an kind-of alternative for this sort of feature.

Clearly these things are outside the abstraction of a virtual actor since they deal with lifecycle of a foreign object, but sometimes it's useful to carefully pierce abstractions.

Jan 02 '19 22:01 ReubenBond

Per @shlomiw:

The problem is that I can have 100k's, or millions of non active players, and only 1000s active.

Is the proposed pattern here to invoke millions of grains with only 1000s of the calls succeeding and the rest throwing the exception? That doesn't seem right to me.

We've discussed the idea of broadcast calls that would get invoked on every currently activated grain. I still have scalability reservations about it, but at least it doesn't involve making calls to inactive grains that would be ignored or throw an exception.

Jan 02 '19 23:01 sergeybykov

@SebastianStehle - thank you for showing these orbit's annotations!

@sergeybykov - having [OnlyIfActivated] attribute, for sure can help with my above "Scenario 1", when I only wish to update a grain's state (if alive). @ReubenBond suggestion to throw a strongly-typed exception sounds good to me, though it's not a real exception it's very simplifying things. The combination with [OneWay] is also a great idea. About "Scenario 2" - I do agree that it's not the best solution, but it'll do the job!

Think also how useful it can be for monitoring purposes, just to check grain's status if alive.

Can you guys consider implementing it?

We've discussed the idea of broadcast calls that would get invoked on every currently activated grain. I still have scalability reservations about it, but at least it doesn't involve making calls to inactive grains that would be ignored or throw an exception.

Broadcasting would also be great! I needed this many times. I tried to use streams for that purpose, but unfortunately it wasn't reliable enough, so I've implemented my own events subscription per silo.

btw - there are other useful annotations in Orbit's. Most of them are already implemented or can be done in the code. One in particular that caught my eye is [SkipUpdateLastAccess] - I had a situation with maintenance timers which self-invoked the grain to avoid reentrancy, but then I found it keeps the grain alive.

Jan 03 '19 08:01 shlomiw

@sergeybykov

I think it might me cleaner in this case to split the grain state into two parts - one can be quickly loaded one a heavier one that you would load only when absolutely necessary

I just realized (and made sure) that I can send parameters with RequestContext to OnActivateAsync, so I could defer the heavy state loading only when necessary. Combine this with IGrainCallFilter interceptor - I can then load the heavy state for the normal methods, just before invoking them. It is Hacky, I know.

I'd still need to activate 100k's of grains which can be heavy on Orleans, but I can do it gradually in a moderated way. The best part is that the state update happens only within the grain context, so no danger of race-conditions, etc. I will use a helper grain to batch the updates to the DB.

Jan 03 '19 08:01 shlomiw

@shlomiw Maybe the active grains could check explicitly on the database if value has been updated (version field) before an operation that has to have the correct version and if the version has changed, reload the state. This way you could batch update on the storage even if some states are active, I think. You could perhaps have a separate "BatchGrain" that gives the most recent batch update ID so you could have a further assurance if there has been a batch update since last time data has been loaded. You could even have the batch ID on the stored data explicitly.

Jan 03 '19 08:01 veikkoeeva

@shlomiw

I'd still need to activate 100k's of grains which can be heavy on Orleans, but I can do it gradually in a moderated way.

That's related to my question above:

Is the proposed pattern here to invoke millions of grains with only 1000s of the calls succeeding and the rest throwing the exception?

Checking if a particular grain is activated would require a remote call to the grain directory. Hence, to perform this check over a million of grains would require a million calls to directory, unless those calls are grouped into large enough batches. I think it would be much cheaper to get a list of all activated grains, e.g. via the ManagementGrain, instead of performing the check for orders of magntude larger number of grains, batched or not..

Jan 04 '19 00:01 sergeybykov

In my scenario i use it for analyzing grain states. I have an endpoint to get some insights about a specific actor and the wrapped rule engine. As I said, creating an instance of the rule engine is very expensive and I can avoid it if the actor is not active.

Jan 04 '19 10:01 SebastianStehle

@sergeybykov You're right - the cost seems to be very heavy. So I'm still not sure how to avoid race-conditions, even if I have the list of the active grains, as you said before, this list would be stale by the time I'm doing the batch updates on the offline players. During this time some 'offline' grains might be activated and I then I could have DB inconsistency issues. Maybe @veikkoeeva suggestion can help, but it wont be easy with my model.

To sum up: Alternative 1 - invoke all the grains, if not active then avoid loading all the state. Have the updates with batch helper grain. Pros:

100% guarantee of correctness.
Minimal DB load.
Easy to understand and maintain.

Cons:

Heavy on orleans. Each grain must be activated. Maybe millions of them.

Alternative 2 - invoke only the online grains (via ManagementGrain), batch update directly on DB for the offline grains. Maybe use coordinator helper grains to avoid inconsistencies. Pros:

Minimum load on Orleans and the DB.

Cons:

Race conditions and DB inconsistencies dangers.
Harder to undrestand and maintain.

Alternative 2 is more scalable, but a bit dangerous if not done right.

I should do the math of how many grains in practice would be involved, and maybe some stress testing on Orleans. How heavy is it really to invoke million grains, with limited parallelism, and how much time it takes.

[OnlyIfActivated] - I still believe it's a very useful feature for different scenarios, as discussed above (i.e. 'Scenario 2').

@SebastianStehle - are you using [SkipUpdateLastAccess] to avoid keeping the actor alive for long time?

Jan 04 '19 22:01 shlomiw

@SebastianStehle Your scenario does sound different if you only need to look up a specific grain.

It's an interesting question if such a scenario warrants a general purpose feature (IsActivated(grain)) at the cost of polluting the virtual actor programming model where you are not supposed to care/know if an actor is activated at any point in time. I'm worried a feature like that may easyly confuse way more people than those that would benefit from it. Even though indexing queries against activated grains would sort of get us into the same territory, it doesn't feel as direct of a violation of the virtual actor model to me.

Jan 04 '19 22:01 sergeybykov

@shlomiw Is there a way to indicate to the grains that they need to perform an update and what exactly they need to do (e.g. by writing the update command/expression to storage and setting a flag that the grains can check)? If there is, then a sequence like the following might work.

Write the update command to storage.
Set a flag (e.g. latest schema version) that all grains would check.
Newly activated grains will see the flag and perform the update as part of their activation process.
Already activated grains can be notified directly or periodically check if the flag is set, and would perform the update action when they learn about the flag.

This way you could quickly update already activated grains and lazily update inactive ones. Do you think something like this could work for you?

Jan 04 '19 22:01 sergeybykov

@sergeybykov unfortunately it's not enough since I have to also update the non-active grains data in a short period of time. For example - I have to update their score and it affects the leaderboards (we query the leaderboards data directly from the DB).

I really appreciate your time!! I'll eventually get this right :) For me it's a very interesting and insightful discussion..

Jan 04 '19 22:01 shlomiw

No worries. 😊

I have to update their score and it affects the leaderboards (we query the leaderboards data directly from the DB).

Can you apply the change to storage for all grains, and then notify the activated ones that they need to reload?

Jan 04 '19 22:01 sergeybykov

But then I'm afraid I might have inconsistencies with the current active grains which updates the score at the end of a game. But I need to think about it more, maybe I could make it work with some additional helper fields in the storage (it's more than just a score). Thanks.

Jan 04 '19 22:01 shlomiw

@sergeybykov What about [OnlyIfActivated]? - I do find it useful in some scenarios, i.e. update specific grain's cached state if alive, or monitoring a grain. I think that the indexing project might be an overkill for simple cases. Would you consider implementing it? should I open a new issue regarding it?

Jan 06 '19 18:01 shlomiw

What about [OnlyIfActivated]? - I do find it useful in some scenarios, i.e. update specific grain's cached state if alive, or monitoring a grain.

What should happen with calls to inactive grains? I think that's the sticking design point. Throwing would be ugly and expensive, returning success - misleading. If it only applied to OneWay, then I thing there wouldn't be that controversial. But would it be useful if limited to OneWay only?

Jan 07 '19 22:01 sergeybykov

I understand your point.

But would it be useful if limited to OneWay only?

Yes, it will. I.e. when updating a grain cached state only if active, see "Scenario 1" in the first comment of this thread.

We can even consider it as a parameter in OneWay attribute -> [OneWay(OnlyIfActivated = true)], this will be clear enough.

Thanks!

Jan 08 '19 08:01 shlomiw

@SebastianStehle What is your opinion here? Would that work for your case or you need to get a response back from the rule engine grain in this scenario?

Jan 09 '19 22:01 sergeybykov

I am querying monitoring information from the grain, so I would need a response. Java returns null if I remember correctly, but an exception or a special return value would be better.

Jan 10 '19 11:01 SebastianStehle

@sergeybykov We face more and more cases where it would help to avoid unnecessary activations. I hope you'd consider pushing it. Thanks!

Jan 28 '19 22:01 shlomiw

Wouldn't the simplest and least invasive change here be to expose a way to ask if the target grain is currently activated? That way we wouldn't need to introduce attributes, etc. There would be some inefficiencies due to possible races on the edges (the grain gets collected or activated immediately after the check), but that doesn't seem like a major burden to me.

Mar 05 '19 22:03 sergeybykov

@sergeybykov

Wouldn't the simplest and least invasive change here be to expose a way to ask if the target grain is currently activated?

That was my original request (see the first post :smile: ). I'm well aware about the race but it's absolutely worth it, as it will be very rare.

I think it's important to be able to do so without using the ManagementGrain to avoid bottleneck on this singleton grain.

One thing to consider, there are many cases where I need to update bulk of grains if they are active. So if we can perform this new check you suggested for a collection of grainIds in an efficient manner it would be much better! (again - I'm less concern about the races here).

As usual - many thanks!

Mar 06 '19 08:03 shlomiw

We are currenly using Orleans as cache for old app we have. But we update data directly to database and then call method like Task Refresh() to reload cache from database. In some cases we need to wait that method to return so we know that cache is updated. So having OneWay attribute where method would be ignored if grain is not active wouldn't be enough for our use case. But if there was a way to check if grain is active or having attribute that would make any grain method that returns plain 'Task' to return immediately with Task.Completed if grain is not active would work for our use case. I would prefer attribute because then I wouldn't have to change call site.

Having to use registry grain to get this functionality is bit of a pain.

Oct 09 '19 16:10 wanton7

orleans orleans copied to clipboard

How to invoke a grain only if active?

Scenario 1

Scenario 2

orleans
orleans copied to clipboard