orleans Adding query functionality

Adding query functionality

Open philbe opened this issue 8 years ago • 30 comments

I’m starting to explore how best to add query capability for Orleans. For example, this could be a navigational API over indexed sets of grains, or a declarative query language such as LINQ over all initialized grains in one or more classes. I’m interested in hearing about scenarios you’ve implemented or would like to implement that would benefit from such a capability.

Jul 14 '15 05:07 philbe

Currently, you can't define access rights for grains, but you do have to know a grain's id to find it. a query capability will make everything global. There should be an IQueryableGrain:IGrain interface that will tag an interface as one that can be queried.

Jul 14 '15 08:07 shayhatsor

There has been a good deal of interest in this direction. I'm not sure what a querying system would look like, but I imagine it would be provider-based, perhaps built as a kind of storage provider.

My biggest question is what would we be querying? Should it be queries over grain state or some kind of separate "grain identity" object which defines the public (queryable) properties of the grain? Should the result of querying be the grain references of the matching grains?

Very interesting, whatever the case.

Jul 14 '15 15:07 ReubenBond

Some of the ideas around EJB Entity Bean Finder methods & EJB QL are a useful place to start thinking around this, from declarative API perspective at least. http://www.oracle.com/technetwork/middleware/ias/how-to-ejb-ql-094899.html http://pic.dhe.ibm.com/infocenter/radhelp/v8r5/index.jsp?topic=%2Fcom.ibm.doclet.doc%2Ftopics%2Ftejb_finder.html

Jul 14 '15 16:07 jthelin

https://orleans.codeplex.com/discussions/640457 https://orleans.codeplex.com/discussions/579812 https://github.com/dotnet/orleans/issues/101 https://orleans.codeplex.com/discussions/578605 https://orleans.codeplex.com/discussions/572542

Jul 14 '15 16:07 gabikliot

I think something like cypher would be a good model for querying, very powerful and flexible; https://en.wikipedia.org/wiki/Cypher_Query_Language or a set of extensions to Orleans which use graphengine.io aka trinity

Jul 14 '15 17:07 grahamehorner

While mostly likely out of the scope of Orleans, something similar or that uses GE/trinity would be interesting. For a lot of business scenarios being able to define a graph with your grains becomes an interesting and useful scenario.

Personally when 'querying' data I have two things to consider, first being the indexing of records, for example find a user grain, by email address and username are common use cases, that leads on to having a unique index and non unique indexes, for example I don't want multiple users with the same email.

The second is relationships between grains and how we track/store them; for example users in a usergroup or a friends list.

The basic hacks I have used to achieve the above are: with the indexing I have a grain per index and the grain stores the ids of target grains. The Id of the lookup grain is a hash of businesstype+property+objectvalue. It is a primitive approach but pretty responsive and scalable - It means I can lookup if an index is already in use as well.

For managing relationships, I normally have a separate grain to hold the collection, for example, I would have a user grain for the user, but say a userTweets grain which would store a list of the user's tweets. I use streams to post data to the collections, which means multiple collections/grains could take a copy of a new tweet, or an update/delete tweet command etc..

The problem I came across with this approach was if the collection grew to a lot of items, 100,000s it became slow to load the first time, and slow to scan, then held a lot of memory for no real benefit. So I extended this approach so the collection grain saved the items to different bucket grains which stripped the data down to ~500ish items, which works a bit like a sorted list on a given key. So in my common scenario tweets in a timeline, I can take (x) items from a certain date without loading the whole collection into memory as my sorted list is based on the "data posted" property of the tweet.

Both are simplistic approaches but scalable; but I would welcome something built into Orleans to handle something similar to the above.

Jul 16 '15 17:07 BenjaminGibbs

@BenjaminGibbs I loved how you built distributed scalable indices with grains! Of course, it is always better that the platform takes care of everything. We would all like super scalable Databases. And we will work on that. But in the meanwhile, being able to relatively easy build your own "partial database", with the same tools and abstractions that one uses to build the app itself, is very powerful I think.

Jul 16 '15 18:07 gabikliot

I agree with @BenjaminGibbs and @gabikliot that a simple mechanism is a good place to start. OTOH, it might be relatively easy to take a bigger step with a storage provider for an RDBMS (e.g., SQL Server) whose query capabilities are passed through to a LINQ interface.

Anyway, thank you everyone for the useful suggestions. I'll report back when I have something more to say. Meanwhile, more suggestions would be welcome.

Jul 17 '15 16:07 philbe

Sounds like a good first step might be to build a storage provider that will map every field on grain State to a separate column. If we use Azure Document DB we will get secondary indices for free, in a very scalable implementation. The next step will be to allow querying those indices, first by just your custom code from within the grain or from outside. Second, we may provide a higher level API that returns a set of GrainReferences based on the query result ("find all grains ids of type Person that live in Seattle and work in Microsoft").

Once we have that basic "enabler" functionality, we can get feedback on its usage and look into a potentially more complicated scenarios.

Jul 31 '15 16:07 gabikliot

Not to resurrect a dying thread here, but as someone who is just getting started with Orleans I have a couple of questions/concerns.

Firstly, is anyone actively working on this issue?

Secondly, it would be great if there was an indexing/querying solution that doesn't rely on DocumentDB. While I absolutely love DocumentDB, it doesn't yet have an emulator or any way to use it that doesn't require you to stand one up in Azure. So making DocumentDB the go-to solution for Orleans at-large would naturally cause those who are prototyping or just getting their feet wet to incur costs that they can presently avoid.

While I'm not opposed to storage providers being part of the solution or finding a way for storage providers to share in the responsibility, it would be great if there was a built-in piece of functionality.

It seems like there would be some simple pattern to follow that would allow "index grains" to index a single property on a type of grain that you could opt in for. That, combined with the aforementioned bucketing might be able to produce a simple version of this feature.

Dec 06 '15 15:12 antoinne85

There is currently no support for querying. @philbe has been doing some research in this direction but there is nothing to share yet.

Dec 08 '15 17:12 sergeybykov

As @sergeybykov says, I've been having some research discussions about this, but I don't expect to have code to share for quite a while.

RE: DocumentDB, I presume you mean that you'd like a storage provider for a database on your workstation that would enable you to test, e.g. in the Azure emulator. A solution should certainly include this, but probably would be a SQL database rather than a JSON store.

Thanks for the vote in favor of "index grains" with bucketing. That does seem like a good place to start.

Dec 10 '15 03:12 philbe

I'm not nearly as experienced with this code base as pretty much everyone here at the moment, so some of this may be obvious or there may be good reasons to exclude these items from consideration, but...

It would be great if the feature could exist at a layer above the storage provider, so that you can use it without regard to the provider you're using. Also, it seems to violate the SRP somewhat to have this class/layer whose only responsibility used to be storing/retrieving data to also now need knowledge of what's in the data to fulfill the role of querying.

Also--this feature is still in the research stages, as you've said, so the reality of how it's shaping up might be very different from the impression that I formed based on previous comments.

Dec 11 '15 19:12 antoinne85

Some query capability can and should be storage-provider agnostic. However, there will be cases where a storage system can execute optimized queries faster than what can be done in the Orleans runtime, e.g., in Azure SQL Database. Balancing these two considerations is part of the design challenge.

Dec 14 '15 17:12 philbe

Maybe there can a place to plug in a decision making engine (state machine that in extreme case could be just pass-through Func?), dynamically tunable, with certain signature that infers from the incoming query parameters how it should be performed.

Dec 14 '15 18:12 veikkoeeva

I haven't had an opportunity to even begin trying out any of what I'm about to describe, but I'd be interested in hearing thoughts/suggestions before I attempt. At the moment, it's probably beyond my capabilities due to inexperience with the framework, but here goes.

Start by assuming we have a BucketingIndexGrain, who is identified by a string. The string is of the form IndexedGrainName_IndexedPropertyName. So if I have an instance of this grain with the ID "UserGrain_FirstName" then the BucketingIndexGrain is responsible for storing and accessing indexed data about the FirstName property of the state of grains of type UserGrain.

For simplicity of the explanation, assume BucketingIndexGrain only knows how to index simple types like int and string, for now.

The role of the BucketingIndexGrain is simply to receive messages about changes to the state of other grains and pass them along to the appropriate bucket. The state change messages would include the following: The old value of the indexed property The new value of the indexed property The ID of the grain whose state changed

Using this information, the BucketingIndexGrain would notify the buckets to add/remove the grain ID/value from their index.

An attribute is introduced called IndexedStateAttribute. The attribute is intended to be applied to any simple property on a GrainState that the developer wants to be queryable. In this example we would apply it to the FirstName property on UserGrainState.

Up to this point, things have probably been pretty straight-forward. Here's where it may become hard/impossible.

A trigger could be established that lets the framework know that it should invoke the BucketingIndexGrain on the developers behalf.

Perhaps the easiest would be whenever the WriteStateAsync method is called, but prior to passing the state along to the StorageProvider. (This likely has a number of problems, but it illustrates the point. The goal I'm driving towards is strong querying capabilities for grains of any type using any storage mechanism.)

When this method is invoked, we inspect the properties of the GrainState looking for members with the IndexedStateAttribute. For every member we find, we notify the BucketingIndexGrain of the change in state. (This would also require us to store a backup copy of the state inside of the Grain base class so we could have access to the old property values for removal from indices--or store that information in some other form. Additionally, there may be some code-gen that could be done somewhere to circumvent reflection.)

Lastly, this setup would need an IQueryable--parts of which would need to be generated. The mechanics of which would take simple expressions (on our simple types) invoke methods on the appropriate BucketingIndexGrains to receive sets of GrainIDs and perform set-based operations on them.

I've never personally created my own IQueryable, but it seems like (if all these pieces existed and the stars aligned) you could write code like:

GrainQueryProvider.UserGrains.Where(g => g.FirstName == "Bob" || g.FirstName == "Charles")

This would get translated into:

Invoke the BucketingIndexGrain with ID "UserGrain_FirstName" and request the IDs of grains who are indexed against the string "Bob".
Invoke the BucketingIndexGrain with ID "UserGrain_FirstName" and request the IDs of grains who are indexed against the string "Charles".
Perform a union on the resulting IDs and return the IDs or grain references to the caller.

At this point, the caller has the ability to fan out to the resulting grains and do whatever he wishes.

There's a lot in there, and several thing that, even if they were done, are still lacking.

For instance, this doesn't affect the storage provider, but does require the grain's state to be persisted somewhere (because of the hook into WriteStateAsync). A better "hook" would be nice.

As described above, it requires getting hooks into the framework--but there's probably also a more "contrib-y" way to go about this that I haven't really considered. (Maybe an alternative base class to Grain that takes care of the hook and some code-gen for the IQueryable.)

Again, I'm still pretty green with the framework, but this has been rattling around in my head for a couple of weeks now so I thought I'd share.

Dec 19 '15 07:12 antoinne85

@philbe Understood--I just wanted to express the notion that it would be great to be able to query our grains (in some capacity) without caring about the storage provider. It sound like you were already thinking of that anyway.

And I agree--if there are ways to eek out more power/performance from a particular storage provider, that would be ideal as well.

For what it's worth--aside from the fact that you can't emulate it in a dev environment yet, I think DocumentDB is a great offering.

Dec 19 '15 07:12 antoinne85

Could graph engine help us? https://www.graphengine.io

Dec 14 '16 04:12 AlbertYi1980

We just announced an indexing mechanism for grains here: #3413.

Sep 18 '17 18:09 philbe

@AlbertYi1980 I asked GE guys to OSS it sometime ago and they did earlier this year. If not integrate its Graph DB, at lest it would guide us to have Graph-based grains and storage.

Sep 18 '17 19:09 galvesribeiro

Hi all, What's the status of this issue? Thanks

Mar 13 '19 17:03 andreujuanc

What is the rough timeline for the release of https://github.com/OrleansContrib/OrleansV2.Fork.Indexing? Also, does it work with Azure Table Storage when using the StorageManagedIndex and if so, how is it implemented under the hood?

Mar 25 '19 09:03 RehanSaeed

@RehanSaeed, I don't think there's a plan on when to release this in Orleans master. @sergeybykov, care to comment?

@TedHartMS has been working with me on a branch that incorporates transactions, which I expect will reach a stable point within a few weeks. However, it still will not have undergone sufficient testing or integration with the released version of Orleans to be used in production.

StorageManagedIndex is only for storage managers that have built-in indexing. Azure Table does not have built-in indexing. Therefore, it would have to support Active or Total indexes. To support Total indexes, each index would presumably be stored as a separate table.

Mar 25 '19 19:03 philbe

We don't have a clear timeline for this. We recently separated the core runtime changes from the indexing feature itself. The former requires work to sort out and likely refactor. Then there's necessary work on the indexing codebase itself that @philbe mentioned. We are trying to do this opportunistically. But the priority of this work is lower for us than some other investments we are focusing on in the near future.

Mar 25 '19 19:03 sergeybykov

I'm new to Orleans and am trying to understand it's viability. One of it's main USP's I am interested in is it's low cost while maintaining high perfromance, scalability and simplicity when combined with Azure Table Storage.

Azure Table does not have built-in indexing. Therefore, it would have to support Active or Total indexes. To support Total indexes, each index would presumably be stored as a separate table.

Yes that seems like the only way you could implement that efficiently.

But the priority of this work is lower for us than some other investments we are focusing on in the near future.

I have raised issue https://github.com/dotnet/orleans/issues/5472 "Recommended way to Handle Collections of Grains" as it seems to a newbie like me that it would be a fairly fundamental feature. Thus far, the recommendation seems to be to go direct to the storage mechanism which seems less than ideal and not easily possible if using the above mentioned Azure Table Storage indexing strategy.

Mar 26 '19 07:03 RehanSaeed

Are there any updates on query support?

Aug 02 '20 17:08 RehanSaeed

No progress.

Aug 07 '20 03:08 sergeybykov

Someone mentioned Cypher above, I had a similar thought today. From 50.000ft Orleans is surprisingly similiar to a Graph DB, so why not use the same techniques?

Aug 03 '21 06:08 Jens-G

We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone.

Jul 28 '22 20:07 ghost

Do you planned add this functionallity to Orleans v.4 ?

Aug 17 '22 16:08 P9avel

orleans orleans copied to clipboard

Adding query functionality

orleans
orleans copied to clipboard