ash icon indicating copy to clipboard operation
ash copied to clipboard

Relationship Complexity Metrics for limiting filters

Open zachdaniel opened this issue 5 years ago • 13 comments

This could solve for the problem of building an arbitrarily complex query.

zachdaniel avatar Jun 21 '20 01:06 zachdaniel

Hi @zachdaniel, could you elaborate more about this enhancement?

mangeption avatar Nov 08 '20 15:11 mangeption

Yep! Sorry for all the issues that are essentially one sentence, I did a huge brain dump at one point to try to get as much into GH issues as I could 😆

So right now, it is very easy to make arbitrarily complex queries via the APIs that Ash creates. Specifically, you could do crazy things like

/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo

So what we want to do is find a good pattern for expressing what filters you can and can't make. In GraphQL, this kind of thing is generally solved with "complexity metrics". A good example can be seen here: https://developer.github.com/v4/guides/resource-limitations/#calculating-nodes-in-a-call

I want to find an expressive way to do this though, potentially in such a way that the information can be reused by other parts of the system later. It will take some thought for sure :).

For instance, the most obvious solution:

allowed_filters [comments: :author, :title, :created_at]

hides a whole lot of meaning/doesn't really give us anything reusable. E.g why is some attribute not filterable?

The route of complexity metrics could look something like this:

aggregates do
  # aggregate complexity could be derived from relationship complexity  
end

calculations do
  calculate :full_name, {Concat, keys: [:first_name, :last_name]} do
    argument :separator, :string
    complexity 10 # Right now this doesn't matter because you can't filter on calculations, but you will be able to some day
  end
end

relationships do
  has_many :comments, MyApp.Comment,
    complexity: 100
end

And while this works, it also requires users to implement a sort of "magical" numbering system that would be custom to their use case and thus not usable by the engine to do anything meaningful.

Ultimately, both of these things are hiding some information about the relationships/calculations in question that we could potentially use to determine much better thresholds.

relationships do
  has_many :comments, MyApp.Comment,
    cardinality: :huge 
  # This is weird wording, but essentially saying "hey, there could be a *ton* of comments on a post
  # Ash could use this to disallow something like `/posts?filter[comment][title]="foo"` (perhaps with explicit allowances where necessary)

  has_many :quality_checks, MyApp.QualityCheck,
    cardinality: :small
end

calculations do
    calculate :full_name, {Concat, keys: [:first_name, :last_name]} do
    argument :separator, :string
    cost :huge # Potentially unnecessary for calculations. We can wait and see
  end
end

What is nice about something like the last example, is that Ash can use that information to its advantage. For example, if a relationship has cardinality :small, it might very well be better for performance to include it in the main query as a join, instead of parallel loading it like it does now. And eventually when we add caching layers, we could side load small relationships into the parent cache to avoid making more database calls in the future. And for calculations, we would know that there is no need to cache a small calculation.

The downside of the last approach (and complexity metrics) is that we would also need a utility to show you exactly the kinds of filters you could make against a resource. E.g mix ash.report MyApp.Resource which would show you what the possible nesting of relationship filters is.

What are your thoughts?

zachdaniel avatar Nov 08 '20 17:11 zachdaniel

/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo

What are the current behaviors of the ash engine and ash_json_api when they encouter the above query?

I actually like the third option as I think it would be easier for users to select from pre-defined options (maybe something like :small, :moderate, :huge) rather than pondering what the optimal value to use like in the second option. I think the advantages of the third options you stated are reasonable.

The downside of the last approach (and complexity metrics) is that we would also need a utility to show you exactly the kinds of filters you could make against a resource. E.g mix ash.report MyApp.Resource which would show you what the possible nesting of relationship filters is.

Regarding this, how complex do you think the utility would be? If we follow the third option, I don't think the utility would introduce any additional complexity since I suppose the logic should be somewhat similar to what the engine would do to switch between different optimization strategies you mentioned.

mangeption avatar Nov 09 '20 16:11 mangeption

/posts?filter[comments][post][comments][post][comments][post][comments][post][comments][post][title]=foo

What are the current behaviors of the ash engine and ash_json_api when they encouter the above query?

Right now it will actually just service that query. E.g posts joined to comments joined to posts joined to comments joined to posts .... to posts where title = "foo"

I think the utility itself shouldn't be too complex (we'd need to be able to answer that question to write the feature). And ultimately this kind of static analysis of the behavior/rules around a resource is something we want anyway (which is why I went with mix ash.report since there would be more info to report).

zachdaniel avatar Nov 09 '20 17:11 zachdaniel

How many options do you think we would need and what name for them do you suggest?

If we go with cardinality field then options :low, :moderate, and :high might be suitable tho I think we probably only need :low and :high at the moment.

mangeption avatar Nov 10 '20 06:11 mangeption

That is a really good question :).

My instinct tells me :trivial | :low | :medium | :high | :huge. Then we need to determine what exactly each one means and how it affects filters and side loads. An interesting idea would be to configure this behavior at the API level. It could default to smart behavior, e.g huge relationships can't be filtered on or loaded, only 2 or 3 high relationships can be filtered/loaded, and so on and so forth.

defmodule MyApp.Api do
  use Ash.Api

  limits do
    ... some kind of syntax for configuring complexity limits
  end
end

zachdaniel avatar Nov 11 '20 00:11 zachdaniel

:trivial | :low | :medium | :high | :huge

These sounds good too. I'm thinking something like the below. The cardinality number is somewhat arbitrary as I have no idea how big the relationship would be too big :sweat_smile:

defmodule MyApp.Api do
  use Ash.Api

  complexity do
    relationship_A :trivial # For cardinality less than 40, join query for side loads, allow filter
    relationship_B :low # For cardinality more than 40 and less than 60, join query for side loads, allow filter
    relationship_C :medium # For cardinality more than 60 and less than 80, parrallel side loads, allow filter
    relationship_D :high # For cardinality more than 80 and less than 100, parallel side loads, limited number of side loads, no filter
    relationship_E :huge # For cardinality more than 100, no side loads, no filter
  end
end

So far we have side loads, filters, and caching, what else do you think could take advantage of this complexity metric?

mangeption avatar Nov 11 '20 14:11 mangeption

I think that complexity do block in the API would just be for configuring the various rules around what the complexities entail. A simple example would be something like

defmodule MyApp.Post do
  use Ash.Resource
  #...
  relationships do
    has_many :comments, MyApp.Comment, cardinality: :huge
  end
end

defmodule MyApp.Comment do
  use Ash.Resource
  #...
  relationships do
    belongs_to :post, MyApp.Post #<- don't need cardinality here because its a to_one relationship
    has_many :subcomments, MyApp.Comment, cardinality: :high
  end
end

defmodule MyApp.Api do
  use Ash.Api

  resources do
    resource MyApp.Comment
    resource MyApp.Post
  end

  complexity do
    enforce? false # disable complexity checking of requests
    allow_load {MyApp.Post, [comments: :subcomments]} # Add an explicit exception that would not have previously been allowed
  end
end

As for the actual numerical thresholds, the fact that you feel the urge to quantify the names makes me rethink if we should just use numerical thresholds in the first place. E.g

relationships do
  has_many :comments, MyApp.Comment, cardinality: 100_000
end
...

relationships do
  has_many :subcomments, MyApp.SubComment, cardinality: 1000
end

Which basically tells Ash to assume it is going to get 100k comments and that each comment will have 1000 subcomments. With that in mind, our rules could 1.) use numeric values in its limits, and 2.) the configuration of it would probably make more sense.

complexity do
  max 100_000 # <- would not allow /posts?include=comments.subcomments, as it cross that limit
  max 1_000_000 # <- would allow that request, as 100k * 1k < 1_000_000
end

As for where these thresholds would be useful:

  • side loads
  • filters
  • caching
  • aggregates
  • authorization filtering planning (potentially)

zachdaniel avatar Nov 11 '20 15:11 zachdaniel

With the pre-defined options, the effects of each option would be defined by us. On the other hand, I think with the numerical threshold, the complexity do block must be able to enable users to define their custom rules on side loads, filters, caching, ..... I feel like the numerical threshold would be more complex. Which one do you think would suit Ash more overall or do you think we should have them both?

mangeption avatar Nov 12 '20 15:11 mangeption

Let's start with the numeric values. We can always map the names to values, e.g huge = 100_000.

zachdaniel avatar Nov 12 '20 15:11 zachdaniel

Got it. I'll work on it

mangeption avatar Nov 12 '20 15:11 mangeption

Sounds great! We can start w/ a PR just for specifying complexity, and then we can figure out a complexity analyzer from there.

zachdaniel avatar Nov 12 '20 15:11 zachdaniel

Also, @kyle5794 we should do this in Ash core, since any interface would want to use this.

zachdaniel avatar Nov 12 '20 15:11 zachdaniel

This has been stale for a while, and I'm more keen to believe that these rules should potentially end up in the api extensions at this point.

zachdaniel avatar Sep 09 '22 21:09 zachdaniel