biothings.api icon indicating copy to clipboard operation
biothings.api copied to clipboard

support filter query parameter

Open newgene opened this issue 9 months ago • 3 comments

Difference between query and filter context in an ES query:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

The query placed in the filter context can be faster than in the normal query context (e.g. passed to q query parameter):

  • filter query does not included in the _score calculation
  • filter query can be cached by ES

When implemented, we can change this API query:

?q=object.umls:C0872079 AND pmid_count:>5 AND predicate:LOCATION_OF

to

?q=object.umls:C0872079&filter=pmid_count:>5 AND predicate:LOCATION_OF

They should return the same list of hits, but it would be preferred if the part of filter query is used repetitively while the q query changes.

newgene avatar Sep 14 '23 00:09 newgene

filter query parameter should work for both GET and POST for the query handler.

Note that we currently used filter to be an alias of fields parameter, for the back-compatiblity, but it's no longer an issue, so we can remove that alias and reuse it for different functionality.

newgene avatar Sep 14 '23 00:09 newgene

We already have a similar implementation in one of API instance:

https://github.com/NIAID-Data-Ecosystem/nde-discovery-api/blob/08dfae752613b45647f8ce83350466733ea1e6e6/nde-web/pipeline.py#L64 (called extra_filter but does the same feature)

newgene avatar Sep 14 '23 00:09 newgene

Also want to note that this is related to post_filter query parameter from https://github.com/biothings/biothings.api/issues/208. The key difference is post_filter does not impact the aggregation results, while filter does, so typically, these two related query parameters will be used in difference scenarios.

newgene avatar Sep 14 '23 00:09 newgene