ash
ash copied to clipboard
Relationship Complexity Metrics for limiting filters
This could solve for the problem of building an arbitrarily complex query.
Hi @zachdaniel, could you elaborate more about this enhancement?
Yep! Sorry for all the issues that are essentially one sentence, I did a huge brain dump at one point to try to get as much into GH issues as I could 😆
So right now, it is very easy to make arbitrarily complex queries via the APIs that Ash creates. Specifically, you could do crazy things like
/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo
So what we want to do is find a good pattern for expressing what filters you can and can't make. In GraphQL, this kind of thing is generally solved with "complexity metrics". A good example can be seen here: https://developer.github.com/v4/guides/resource-limitations/#calculating-nodes-in-a-call
I want to find an expressive way to do this though, potentially in such a way that the information can be reused by other parts of the system later. It will take some thought for sure :).
For instance, the most obvious solution:
allowed_filters [comments: :author, :title, :created_at]
hides a whole lot of meaning/doesn't really give us anything reusable. E.g why is some attribute not filterable?
The route of complexity metrics could look something like this:
aggregates do
# aggregate complexity could be derived from relationship complexity
end
calculations do
calculate :full_name, {Concat, keys: [:first_name, :last_name]} do
argument :separator, :string
complexity 10 # Right now this doesn't matter because you can't filter on calculations, but you will be able to some day
end
end
relationships do
has_many :comments, MyApp.Comment,
complexity: 100
end
And while this works, it also requires users to implement a sort of "magical" numbering system that would be custom to their use case and thus not usable by the engine to do anything meaningful.
Ultimately, both of these things are hiding some information about the relationships/calculations in question that we could potentially use to determine much better thresholds.
relationships do
has_many :comments, MyApp.Comment,
cardinality: :huge
# This is weird wording, but essentially saying "hey, there could be a *ton* of comments on a post
# Ash could use this to disallow something like `/posts?filter[comment][title]="foo"` (perhaps with explicit allowances where necessary)
has_many :quality_checks, MyApp.QualityCheck,
cardinality: :small
end
calculations do
calculate :full_name, {Concat, keys: [:first_name, :last_name]} do
argument :separator, :string
cost :huge # Potentially unnecessary for calculations. We can wait and see
end
end
What is nice about something like the last example, is that Ash can use that information to its advantage. For example, if a relationship has cardinality :small, it might very well be better for performance to include it in the main query as a join, instead of parallel loading it like it does now. And eventually when we add caching layers, we could side load small relationships into the parent cache to avoid making more database calls in the future. And for calculations, we would know that there is no need to cache a small calculation.
The downside of the last approach (and complexity metrics) is that we would also need a utility to show you exactly the kinds of filters you could make against a resource. E.g mix ash.report MyApp.Resource which would show you what the possible nesting of relationship filters is.
What are your thoughts?
/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo
What are the current behaviors of the ash engine and ash_json_api when they encouter the above query?
I actually like the third option as I think it would be easier for users to select from pre-defined options (maybe something like :small, :moderate, :huge) rather than pondering what the optimal value to use like in the second option. I think the advantages of the third options you stated are reasonable.
The downside of the last approach (and complexity metrics) is that we would also need a utility to show you exactly the kinds of filters you could make against a resource. E.g mix ash.report MyApp.Resource which would show you what the possible nesting of relationship filters is.
Regarding this, how complex do you think the utility would be? If we follow the third option, I don't think the utility would introduce any additional complexity since I suppose the logic should be somewhat similar to what the engine would do to switch between different optimization strategies you mentioned.
/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo
What are the current behaviors of the ash engine and ash_json_api when they encouter the above query?
Right now it will actually just service that query. E.g posts joined to comments joined to posts joined to comments joined to posts .... to posts where title = "foo"
I think the utility itself shouldn't be too complex (we'd need to be able to answer that question to write the feature). And ultimately this kind of static analysis of the behavior/rules around a resource is something we want anyway (which is why I went with mix ash.report since there would be more info to report).
How many options do you think we would need and what name for them do you suggest?
If we go with cardinality field then options :low, :moderate, and :high might be suitable tho I think we probably only need :low and :high at the moment.
That is a really good question :).
My instinct tells me :trivial | :low | :medium | :high | :huge. Then we need to determine what exactly each one means and how it affects filters and side loads. An interesting idea would be to configure this behavior at the API level. It could default to smart behavior, e.g huge relationships can't be filtered on or loaded, only 2 or 3 high relationships can be filtered/loaded, and so on and so forth.
defmodule MyApp.Api do
use Ash.Api
limits do
... some kind of syntax for configuring complexity limits
end
end
:trivial | :low | :medium | :high | :huge
These sounds good too. I'm thinking something like the below. The cardinality number is somewhat arbitrary as I have no idea how big the relationship would be too big :sweat_smile:
defmodule MyApp.Api do
use Ash.Api
complexity do
relationship_A :trivial # For cardinality less than 40, join query for side loads, allow filter
relationship_B :low # For cardinality more than 40 and less than 60, join query for side loads, allow filter
relationship_C :medium # For cardinality more than 60 and less than 80, parrallel side loads, allow filter
relationship_D :high # For cardinality more than 80 and less than 100, parallel side loads, limited number of side loads, no filter
relationship_E :huge # For cardinality more than 100, no side loads, no filter
end
end
So far we have side loads, filters, and caching, what else do you think could take advantage of this complexity metric?
I think that complexity do block in the API would just be for configuring the various rules around what the complexities entail. A simple example would be something like
defmodule MyApp.Post do
use Ash.Resource
#...
relationships do
has_many :comments, MyApp.Comment, cardinality: :huge
end
end
defmodule MyApp.Comment do
use Ash.Resource
#...
relationships do
belongs_to :post, MyApp.Post #<- don't need cardinality here because its a to_one relationship
has_many :subcomments, MyApp.Comment, cardinality: :high
end
end
defmodule MyApp.Api do
use Ash.Api
resources do
resource MyApp.Comment
resource MyApp.Post
end
complexity do
enforce? false # disable complexity checking of requests
allow_load {MyApp.Post, [comments: :subcomments]} # Add an explicit exception that would not have previously been allowed
end
end
As for the actual numerical thresholds, the fact that you feel the urge to quantify the names makes me rethink if we should just use numerical thresholds in the first place. E.g
relationships do
has_many :comments, MyApp.Comment, cardinality: 100_000
end
...
relationships do
has_many :subcomments, MyApp.SubComment, cardinality: 1000
end
Which basically tells Ash to assume it is going to get 100k comments and that each comment will have 1000 subcomments. With that in mind, our rules could 1.) use numeric values in its limits, and 2.) the configuration of it would probably make more sense.
complexity do
max 100_000 # <- would not allow /posts?include=comments.subcomments, as it cross that limit
max 1_000_000 # <- would allow that request, as 100k * 1k < 1_000_000
end
As for where these thresholds would be useful:
- side loads
- filters
- caching
- aggregates
- authorization filtering planning (potentially)
With the pre-defined options, the effects of each option would be defined by us. On the other hand, I think with the numerical threshold, the complexity do block must be able to enable users to define their custom rules on side loads, filters, caching, ..... I feel like the numerical threshold would be more complex. Which one do you think would suit Ash more overall or do you think we should have them both?
Let's start with the numeric values. We can always map the names to values, e.g huge = 100_000.
Got it. I'll work on it
Sounds great! We can start w/ a PR just for specifying complexity, and then we can figure out a complexity analyzer from there.
Also, @kyle5794 we should do this in Ash core, since any interface would want to use this.
This has been stale for a while, and I'm more keen to believe that these rules should potentially end up in the api extensions at this point.