manageiq-api icon indicating copy to clipboard operation
manageiq-api copied to clipboard

Unbound query through associations

Open Fryguy opened this issue 5 years ago • 6 comments

While we have the max_results_per_page, which can keep memory limits down, associations do not abide by that value, allowing for an unbounded query.

So, for example, let's say you have 1 provider with 50K vms. If you hit /api/vms or /api/providers/1/vms you will be limited to 1000 vms per page. However if you hit /api/providers?expand=resources&attributes=vms, then you will get back all 50K vms fully expanded.

I'm not sure what approach we can take here, but I'm thinking maybe we just disallow full expansion of association attributes, and instead force selecting specific columns (via #871)? Or perhaps disallow when the size of the association is greater than the max_results_per_page?

cc @NickLaMuro @abellotti @jrafanie

Fryguy avatar Oct 09 '20 21:10 Fryguy

My initial thought is to potentially augment our Api::QueryCounts library to be able to handle fetching bulk counts (or at least counts from one place), and then using that to add a .limit to any relation we query.

That said:

  • I don't know how doable that is with .includes
  • I think it would be pretty invasive as we would have to push our calls to Api::QueryCounts to the front of the request, not on the tail end as it is currently
  • not sure how we would handle showing this as part of the request payload

NickLaMuro avatar Oct 09 '20 21:10 NickLaMuro

I think my issue is that we have no way of paging those associations even if we did add a limit, and then we also have no way of telling the end user that the limit was applied and they only received a partial payload.

My gut feeling is just to straight disallow has_manys via attributes unless there are specific columns defined. For example

  • :x: /api/providers?expand=resources&attributes=vms
  • /api/providers?expand=resources&attributes=vms.name,vms.id (which would require #871)

Some associations, like tags we may want to allow because they are common and not very big, perhaps instead by having an allowlist via expand=tags. (We currently have things like expand=policies and I don't understand how that's different than attributes=policies)

I'm almost wondering if we should just remove expand=resources and force users to specify every column they care about, but maybe that's a bit draconian 😄

Fryguy avatar Oct 12 '20 15:10 Fryguy

My concern is if we disallow has_manys, we're breaking compatibility, https://www.manageiq.org/docs/reference/latest/api/overview/query.html

while I'm not opposed to doing this for big ticket items (vms, etc.), it's the smaller associations that folks have been querying and expect to get back. expand=<subcollection> allows api callers to fetch multiple subcollections in a single query to help reduce roundtrips (api predating GraphQL 😄 ) expand=<subcollection> would get properly href's items vs just an association query.

I do like the idea of requiring specific columns, like vms.name.

Maybe an option is to have a blacklist of associations that we know can tend to be large (vms, events, notificatioons, etc), and only apply the new logic to those ?

abellotti avatar Oct 12 '20 16:10 abellotti

but maybe that's a bit draconian

This made me think that possibly what we could do instead of trying to "take things away" is instead add something to make tracking this issues down easier. For us, looking through the logs is fine enough (assuming we can get access to them), but would possibly providing an automated process for analyzing those logs, and presenting them via the UI/API be a better solution?

My thinking is that the problem stems from users not knowing they are making poor choices with the API, and generally we have ways of mitigating those performance problems already and they just need to be used. By taking things away, we are just making it more complex to use the API, not easier.

Maybe an option is to have a blacklist of associations that we know can tend to be large (vms, events, notificatioons, etc), and only apply the new logic to those ?

This might work, though, I think with my thought above, possibly we make this configurable instead of something that is hardcoded in the app. We can set it with "sane defaults", but allow the users to shoot themselves in the foot if they so choose.

NickLaMuro avatar Oct 12 '20 16:10 NickLaMuro

My thinking is that the problem stems from users not knowing they are making poor choices with the API

I agree, but I can't see anything we can do that could prevent that, besides, well, actually preventing that. If we leave the capability intact, people are going to use it, which leads to killing the workers, and I think we have to prioritize a user shooting themselves in the foot (by killing the app) over a user's convenience. If we could find a way to keep the capability, but not make the workers overload themselves, I'd be all for it (for example, maybe find_in_batches might help, but even so, JSON dumping an enormous payload will likely always kill memory)

Fryguy avatar Oct 12 '20 17:10 Fryguy

Maybe an option is to have a blacklist of associations that we know can tend to be large (vms, events, notificatioons, etc), and only apply the new logic to those ?

This might work, though, I think with my thought above, possibly we make this configurable instead of something that is hardcoded in the app. We can set it with "sane defaults", but allow the users to shoot themselves in the foot if they so choose.

I'm kind of on the fence. While I like the idea, it does have some level of inconsistency (some things are queryable, others are not). If we make it configurable, on a minor note, it makes documenting these inconsistencies difficult/confusing, in that we'd have to say "by default, vms are not queryable, but the administrator can enable it)

Fryguy avatar Oct 12 '20 17:10 Fryguy