Cytomine-core icon indicating copy to clipboard operation
Cytomine-core copied to clipboard

When filtering annotation collections on term, the other terms associated with annotations not returned

Open elodieburtin opened this issue 5 years ago • 3 comments

When querying an annotation collection with GET /api/annotation.json and using term query parameter to restrict the results to annotations associated with a given term, the returned annotations only include the searched terms in term, even if the annotation was associated with other terms as well. Shouldn't the filtering only reduce the number of results, not the information provided for each result?

elodieburtin avatar Nov 05 '18 17:11 elodieburtin

Multiple problems here :

  • The term is given by the table annotation_term.
  • We can't easily display all the terms of the annotation (see bellow)
  • We can filter on a term and we display in the result the used filter

The current SQL query is

SELECT a.*, at.term_id as term 
FROM user_annotation a  LEFT OUTER JOIN annotation_term at ON a.id = at.user_annotation_id
WHERE a.project_id = X
AND a.user_id IN (Y)
 AND at.term_id = Z

To get what is asked we need to have "all the annotations (and its terms) that have the term Z"

One solution to do it has two steps : 1) Select as now 2) rejoin on the annotation_term table i.e.

SELECT a.*, at.term_id as term 
FROM (
    SELECT x.* FROM user_annotation x 
                LEFT OUTER JOIN annotation_term y ON x.id = y.user_annotation_id 
                WHERE x.project_id = 175 AND x.user_id IN (30) AND y.term_id = 187) a 
LEFT OUTER JOIN annotation_term at ON a.id = at.user_annotation_id

The aggregation is then made here It is not a well written query and I think it will have a performance impact. Do you have any better SQL query to do the same thing ?

If performance is impacted, we can stop returning the "term" key in the results of the current query because it contains wrong data and add a tag if the request really want all the terms of the selected annotation.

geektortoise avatar Jan 14 '19 14:01 geektortoise

How about

SELECT a.*, json_agg(at.id) as term
FROM user_annotation a  
LEFT OUTER JOIN annotation_term at 
     ON a.id = at.user_annotation_id
WHERE a.project_id = X
     AND a.user_id IN (Y)
     AND a.id IN (SELECT at_match.user_annotation_id FROM annotation_term at_match WHERE at_match.term_id = Z)
GROUP BY a.id

It is very similar to what you propose, but from my tests on local instance (with a limited annotation count), it does not seem to have a big performance impact. Moreover, it may remove the need to perform the term aggregation manually (provided that the field returned by json_agg() can be used easily afterwards, which I don't know and didn't check).

elodieburtin avatar Jan 14 '19 15:01 elodieburtin

Hello,

The json_agg need a GROUP BY. It is tricky to make it work with all the possible requests.

So I used the previous proposition. More easy to integrate. https://github.com/cytomine/Cytomine-core/commit/140007ac0fa13372b563d39b5a52620325fa0fd6

I think than a refactoring of the Annotation listing service is needed.

geektortoise avatar Mar 19 '19 13:03 geektortoise