Cytomine-core
Cytomine-core copied to clipboard
When filtering annotation collections on term, the other terms associated with annotations not returned
When querying an annotation collection with GET /api/annotation.json
and using term
query parameter to restrict the results to annotations associated with a given term, the returned annotations only include the searched terms in term
, even if the annotation was associated with other terms as well. Shouldn't the filtering only reduce the number of results, not the information provided for each result?
Multiple problems here :
- The term is given by the table annotation_term.
- We can't easily display all the terms of the annotation (see bellow)
- We can filter on a term and we display in the result the used filter
The current SQL query is
SELECT a.*, at.term_id as term
FROM user_annotation a LEFT OUTER JOIN annotation_term at ON a.id = at.user_annotation_id
WHERE a.project_id = X
AND a.user_id IN (Y)
AND at.term_id = Z
To get what is asked we need to have "all the annotations (and its terms) that have the term Z"
One solution to do it has two steps : 1) Select as now 2) rejoin on the annotation_term table i.e.
SELECT a.*, at.term_id as term
FROM (
SELECT x.* FROM user_annotation x
LEFT OUTER JOIN annotation_term y ON x.id = y.user_annotation_id
WHERE x.project_id = 175 AND x.user_id IN (30) AND y.term_id = 187) a
LEFT OUTER JOIN annotation_term at ON a.id = at.user_annotation_id
The aggregation is then made here It is not a well written query and I think it will have a performance impact. Do you have any better SQL query to do the same thing ?
If performance is impacted, we can stop returning the "term" key in the results of the current query because it contains wrong data and add a tag if the request really want all the terms of the selected annotation.
How about
SELECT a.*, json_agg(at.id) as term
FROM user_annotation a
LEFT OUTER JOIN annotation_term at
ON a.id = at.user_annotation_id
WHERE a.project_id = X
AND a.user_id IN (Y)
AND a.id IN (SELECT at_match.user_annotation_id FROM annotation_term at_match WHERE at_match.term_id = Z)
GROUP BY a.id
It is very similar to what you propose, but from my tests on local instance (with a limited annotation count), it does not seem to have a big performance impact. Moreover, it may remove the need to perform the term aggregation manually (provided that the field returned by json_agg() can be used easily afterwards, which I don't know and didn't check).
Hello,
The json_agg need a GROUP BY. It is tricky to make it work with all the possible requests.
So I used the previous proposition. More easy to integrate. https://github.com/cytomine/Cytomine-core/commit/140007ac0fa13372b563d39b5a52620325fa0fd6
I think than a refactoring of the Annotation listing service is needed.