mme-apis icon indicating copy to clipboard operation
mme-apis copied to clipboard

Result paging/max number of requests

Open allasm opened this issue 10 years ago • 28 comments

Add support for either paging or a "maxNumReplies" field

allasm avatar Jan 20 '15 19:01 allasm

+1

Relequestual avatar Mar 10 '15 15:03 Relequestual

+1

fschiettecatte avatar Mar 10 '15 16:03 fschiettecatte

With reference to https://github.com/ga4gh/mme-apis/issues/86#issuecomment-77477745, we should also add totalResults. We will need to work out what level these elements should be at.

Relequestual avatar Sep 28 '15 09:09 Relequestual

This would be a breaking change. Removing from 1.1 milestone.

DECIPHER currently limits the number of results, following how Phenome Central works. Given that, we could close this issue in favour of suggesting that systems only return the top 20 matches. Thoughts?

Relequestual avatar Aug 04 '16 10:08 Relequestual

I am fine with the top 20, seems like a defensible number to me.

fschiettecatte avatar Aug 04 '16 11:08 fschiettecatte

I still think we should allow specification, with size/offset or page/pageSize parameters, but I'm fine with removing this from v1.1.

buske avatar Aug 04 '16 19:08 buske

Hi guys, Sorry, still catching up to these older issues on github,

If I am reading correctly: the ticket was for setting a hard limit on the number of results returned? I assume this top 20 is defined via the home centers match and scoring criteria?

I would vote against such a limitation being put in since it would have the potential to not share certain good matches. I think what Orion is suggesting was doing a similar (but less restrictive) thing via paging? I think that is a good idea.

On chatting with our analysts, their view was that they would like to see all hits (irrespective of score) that share some gene (a gene that a shared variant happens) on.

Or I am reading the ticket/idea wrong?

Best, Harindra

On Thu, Aug 4, 2016 at 3:03 PM, Orion Buske [email protected] wrote:

I still think we should allow specification, with size/offset or page/pageSize parameters, but I'm fine with removing this from v1.1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-237651328, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_E9bCe5D2zcMSF5zvRu3BYI5ZRNWks5qcjeCgaJpZM4DU0qc .

harindra-a avatar Aug 04 '16 19:08 harindra-a

Hey @harindra-a, you're right. The ticket was created to support specifying paging or response size control in the query. @Relequestual was proposing instead to set a hard limit.

buske avatar Aug 04 '16 19:08 buske

We should discuss this in Toronto, my vote is to keep the 1.0-1.5 as simple as possible, and pushing the start/limit to 2.0 because that is a more natural fit with what Orion is proposing for 2.0, namely a 'search' paradigm rather than a 'matching' paradigm.

fschiettecatte avatar Aug 04 '16 19:08 fschiettecatte

OK great, thanks Orion, I definitely vote paging, and we would need it either way at some point I think.

On Thu, Aug 4, 2016 at 3:55 PM, François Schiettecatte < [email protected]> wrote:

We should discuss this in Toronto, my vote is to keep the 1.0-1.5 as simple as possible, and pushing the start/limit to 2.0 because that is a more natural fit with what Orion is proposing for 2.0, namely a 'search' paradigm rather than a 'matching' paradigm.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-237664820, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_NaUzqRJFRBuBbS4uZH7LdpdmWGSks5qckOUgaJpZM4DU0qc .

harindra-a avatar Aug 04 '16 19:08 harindra-a

A lot of feedback we have gotten from people using the MME, including internally within MME from users in the policy group, is too many results are bad and unuseful. They want only a few results, they don't, ideally, want more than a handfull.

Showing ALL results can sometimes, in our case, result in thousands of patient records. If you're getting thousands of genuin matches, then your probably would have found the same result in population varaiation databases already. Otherwise, we need to allow the user to specify further the matching cirteria (coming in 2.0), reducing the number of results.

When searching on the DECIPHER website, if any query has too many results, we simply tell the user their search is too broad, and they need to be more specific. I think that encouridging this in the MME network also has value, hopefully encouridging the users to included more detailed genotypic and phenotypic data, making the record more useful.

Relequestual avatar Aug 10 '16 08:08 Relequestual

I completely agree. I think it's useful to separate:

  1. How many matches a site should show a user on the site (I agree, a pretty small number)
  2. How many matches a site should actually notify a user of (e.g., via email) (I think a VERY small number)
  3. How many matches the API should return (I don't see any reason to limit this, as long as the server can threshold/filter however it likes)

buske avatar Aug 15 '16 07:08 buske

I agree with Orion's points. 1,2 deals with user-experience, and 3 is a backend/server concern.

That separation allows the host to filter/control the user experience (1,2) via some front-end/process and do what Ben wanted to do(?) and still not limit the API with an arbitrary cutoff (which we also think is not a good idea).

On Mon, Aug 15, 2016 at 3:58 AM, Orion Buske [email protected] wrote:

I completely agree. I think it's useful to separate:

  1. How many matches a site should show a user on the site (I agree, a pretty small number)
  2. How many matches a site should actually notify a user of (e.g., via email) (I think a VERY small number)
  3. How many matches the API should return (I don't see any reason to limit this, as long as the server can threshold/filter however it likes)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-239748099, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_F_RYtgWhnrl6tisy36uC47Jjbwiks5qgBwQgaJpZM4DU0qc .

harindra-a avatar Aug 15 '16 09:08 harindra-a

We arrived at the need to set an arbitrary cut off because there were some cases where hundreds or even over a thousand results were returned. No system was setup to handle processing and displaying that number of results.

Say the data was paged at 50, and you had 500 results in a random order. You would have to retrieve and store all that data, process it to work out which was the best results to display, filter out othe results, then display to the user. That situation creates a huge overhead and adds complexity for each system wanting to implement MME.

Also there's the risk (from the responders perspective) of divulging too much data in one go. We do not want to expose such a large amount of data. Making the request more specific will give the system (and the user) access to what they want without such a risk.

I can't support adding pagination of results, but I think returning a number of additional results makes sense, allowing the user / system to make a more specific request. That is something that should come with a newer "2.0" (but not semver 2.0) version of the API.

Relequestual avatar Aug 15 '16 10:08 Relequestual

I'll wade into this discussion. I agree with @Relequestual on all counts.

The current (first) version of the protocol is a best-effort match and present those matches to the user. In this context it makes sense to cap the number of matches, at both ends of the process.

The model for the next version migrates the relevance assessment from the server to the client, in that context it makes perfect sense to allow the client to request as much data as it wants/needs and to paginate through that data in manageable chunks (whatever that means). Though that still does not prevent the server from capping the maximum number of results it is willing to return.

fschiettecatte avatar Aug 15 '16 17:08 fschiettecatte

In the early days before we implemented a limit, we returned a huge chunk of our database based on some queries. This is not desierable and so a limit was implemented. If someone wishes to access all our data, we have seperate agreements, documents, and procedures for that access. For MME, it makes sense we (DECIPHER) to limit the number of results, report there are potentially more, and allow the other system (and their user) to refine the query to be more specific.

Relequestual avatar Aug 16 '16 08:08 Relequestual

For what its worth...

As you know, I have always argued for putting more 'control' in the hands of the person doing the query. And given the objective of MME it makes no sense to execute a query that return very large numbers of patients. So I tend to agree with Ben on this one.

But surely the API can be designed to 'allow' a server to return an unlimited number of results (in case some server wants to do that) and also include the potential for the server to limit the number returned according to its own policy.

Cheers

Tony


From: Ben Hutton [[email protected]] Sent: 16 August 2016 09:13 To: ga4gh/mme-apis Subject: Re: [ga4gh/mme-apis] Result paging/max number of requests (#36)

In the early days before we implemented a limit, we returned a huge chunk of our database based on some queries. This is not desierable and so a limit was implemented. If someone wishes to access all our data, we have seperate agreements, documents, and procedures for that access. For MME, it makes sense we (DECIPHER) to limit the number of results, report there are potentially more, and allow the other system (and their user) to refine the query to be more specific.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ga4gh/mme-apis/issues/36#issuecomment-240033381, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI_EVHqwuCCEjet80ICUtHA_dHPaO0PFks5qgXFGgaJpZM4DU0qc.

antbro avatar Aug 16 '16 08:08 antbro

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

  • for the current API, we add a limit field (name based on the common way to paginate APIs using limit and offset parameters)
  • the responding server MUST return at most this number of matches, but MAY return fewer

I suggest we consider:

  • adding a count field (or some other name) to the response to indicate the total number of matches that were found
  • adopting limit/offset pagination for exome matching or if things move forward with the symmetric data sharing proposal

buske avatar Aug 17 '16 17:08 buske

+1 T

Orion Buske wrote:

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

* for the current API, we add a |limit| field (name based on the
  common way to paginate APIs using |limit| and |offset| parameters)
* the responding server MUST return at most this number of
  matches, but MAY return fewer

I suggest we consider:

* adding a |count| field (or some other name) to the response to
  indicate the total number of matches that were found
* adopting |limit|/|offset| pagination for exome matching or if
  things move forward with the symmetric data sharing proposal

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492026, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_EVAGA6pn20E0JROirVPvag4sa5HQhks5qg0pDgaJpZM4DU0qc.

antbro avatar Aug 17 '16 17:08 antbro

all great points Orion, agreed.

And that is more or less the common practice as well in other production grade APIs, so even better.

Best, Harindra

On Wed, Aug 17, 2016 at 1:55 PM, antbro [email protected] wrote:

+1 T

Orion Buske wrote:

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

  • for the current API, we add a |limit| field (name based on the common way to paginate APIs using |limit| and |offset| parameters)
  • the responding server MUST return at most this number of matches, but MAY return fewer

I suggest we consider:

  • adding a |count| field (or some other name) to the response to indicate the total number of matches that were found
  • adopting |limit|/|offset| pagination for exome matching or if things move forward with the symmetric data sharing proposal

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492026, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI_ EVAGA6pn20E0JROirVPvag4sa5HQhks5qg0pDgaJpZM4DU0qc>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492985, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_G8lLZYaHQKvllOEAQyqgUNiaVstks5qg0sGgaJpZM4DU0qc .

harindra-a avatar Aug 17 '16 18:08 harindra-a

A note on order of results. JSON is unordered, so it is expected that the client making the request, orders the results based on score.

Relequestual avatar Aug 18 '16 08:08 Relequestual

@Relequestual JSON lists are ordered. :)

buske avatar Aug 18 '16 15:08 buske

@buske ah good point. A LIST is ordered, just not key/value pairs in an object. My mistake.

Relequestual avatar Aug 19 '16 12:08 Relequestual

Revitalising this issue, at the very least we could add a "number of total records that are a potential match" field, which will indicate to the user if their results are beyond 20 or not when 20 are returned. This is informative in terms of knowing that their patient record probably doesn't have enough discriminatory features, the case is common, or the candidate gene could be wrong (Maybe, I don't know in practice).

Relequestual avatar Dec 05 '16 10:12 Relequestual

I was thinking for simple queries, it could be possible to include a link to view additional results on the returning website. At least in DECIPHERS case, and probably also myGene2, it's totally possible to search for all patients with a given gene and a set of phenotypes. If others agree they can see the use case, I'll create a new issue. @jxchong thoughts?

Relequestual avatar Dec 05 '16 10:12 Relequestual

I can see a use for this but I wonder if this rightly belongs to the next major release of the protocol? Another approach would to use some sort of pagination system with a start and a limit which would allow to jump around in a result set (like a search engine). Your approach is better suited if you want to pull all the data in a results set (the Open Archive Initiative uses this approach).

fschiettecatte avatar Dec 05 '16 14:12 fschiettecatte

I agree with François that a pagination system is a better solution if you want to go this route, I also agree with François that this probably belongs with the next major release where we can design it in better.

Best, Harindra

On Mon, Dec 5, 2016 at 9:21 AM, François Schiettecatte < [email protected]> wrote:

I can see a use for this but I wonder if this rightly belongs to the next major release of the protocol? Another approach would to use some sort of pagination system with a start and a limit which would allow to jump around in a result set (like a search engine). Your approach is better suited if you want to pull all the data in a results set (the Open Archive Initiative uses this approach).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-264865786, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_LcyyFKKOR6rKraVy6BG6PX0pILhks5rFB31gaJpZM4DU0qc .

harindra-a avatar Dec 05 '16 14:12 harindra-a

Pagination would be a really hard, possibly impossible sell, for us. We don't want to expose too much data at the same time. It may all be open-access, but that doesn't mean we should make accessing all of it too easy. We have instances where people have tried to scrape our data, which we can then identify and block.

Relequestual avatar Dec 06 '16 11:12 Relequestual