api icon indicating copy to clipboard operation
api copied to clipboard

Add a parameter to limit reference

Open fra967 opened this issue 8 years ago • 6 comments

As a schema designer, I would like to availability to restrict composition in the case the field references a collection known to have a large number of records.

Consider for example a collection of redirects (few thousand records) referencing articles (hundreds of thousands of records).

A query with a count limit should be allowed, but a query without a count limit should be blocked, otherwise it would just send the node process out of memory.

An additional parameter could be introduced in the reference field settings to specify whether a unlimited record set can be composed with the referenced collection.

The default behaviour should allow composition to happen. We just want the ability to restrict this in known situations.

fra967 avatar May 05 '17 12:05 fra967

Referenced collections are queried by _id, only one result should be returned. Perhaps I misunderstand, is this related to #257?

Do you mean that given a reference field containing an array of 100 ids, we should limit the returned data when pulling all for composition? Eg by only retrieving x of those 100?

jimlambie avatar May 05 '17 20:05 jimlambie

it's somehow related to #257

consider for example redirects collection, where each document references the articles collection

a query getting 100 redirects is fine, at present it will trigger 100 individual lookup by _id; #257 is asking to make one query to the articles collection using $in and all those _ids, or break it down in batches

but a query getting all redirects (i.e. 5000 of them) will trigger 5000 individual lookup queries in the articles collection, and send node out of memory

#257 would shift the problem to sending one query with a huge $in statement, including 5000 _ids; my guess is that this would also send node out of memory, but we cannot test it until #257 is implemented

in any case, i would prefer to have one parameter in the schema so that, in cases like this, i can allow composition on queries with N redirects but block composition on queries with all redirects, and this is what this ticket is about

but yes #257 comes first

fra967 avatar May 05 '17 20:05 fra967

Ok I understand, thanks for the clarification!

jimlambie avatar May 05 '17 20:05 jimlambie

Hello! 👋

May I ask for some clarification on what needs to be done?

a query getting 100 redirects is fine, at present it will trigger 100 individual lookup by _id; (...) but a query getting all redirects (i.e. 5000 of them) will trigger 5000 individual lookup queries in the articles collection, and send node out of memory

In what situation would we attempt to get all redirects? Given a reference field, which will be a string containing an ID or an array containing multiple, we will query the referenced collection for the given IDs and get the referenced documents back. We'll never ask for the whole collection.

Do you mean that given a reference field containing an array of 100 ids, we should limit the returned data when pulling all for composition? Eg by only retrieving x of those 100?

Are we saying that this is what's required?

eduardoboucas avatar Apr 04 '18 08:04 eduardoboucas

We just had another occurrence on a similar composition situation that took some time to troubleshoot (API version 2.3.2, so #257 fix already present, which makes things much better but does not protect API from crashing in these scenarios)

The cause was a query with count=0 on a collection referencing some other very large collection, without any field selection (that is, a badly written query). This just sends the node process out-of-memory, and it may take some time to figure out why.

I was originally thinking that we could add some flags to allow a schema designer to protect against dangerous queries, for example when referencing a known large collection without fields selection (that's what this ticket is about). But we should probably expand the discussion and consider what API could do in order to evaluate queries and refuse to execute them, or stop their execution, when they demand too many resources. Useful also as an anti-DDOS attack feature.

The underlying database (MongoDB) does something similar, when it gets a query that is too complex or large to deal with, returns an error and keeps working.

fra967 avatar Apr 10 '18 12:04 fra967

Thanks for the update, that makes sense. I suggest we discuss this on next week's product meeting (or just a short call with me, you and @jimlambie to avoid boring other people to death). I can summarise what API 3.1.0 changed in terms of Reference fields and we can discuss where that leaves us in terms of protection against dangerous queries.

eduardoboucas avatar Apr 10 '18 12:04 eduardoboucas