biolink-api icon indicating copy to clipboard operation
biolink-api copied to clipboard

Need functions for determining term-to-term relatedness

Open selewis opened this issue 5 years ago • 10 comments

Annotations vary considerably in precision, but for conciseness and to determine coverage, we need to be able to answer basic graph traversal questions. For example, given two terms is one of them a subclass of the other? Or what is the closest common parent term of two terms. Right now this functionality is missing and we're dealing with work-arounds or it's completely holding things up.

selewis avatar Aug 03 '18 19:08 selewis

Hi Suzy,

A partial solution to your problem is: https://api.geneontology.cloud/go/GO_0060070/hierarchy which indicates both the parents & children of one GO term (not two). I plan also to make a more general one /relationship to explore all other term-to-term relatedness, but I could also create a path where you could ask the same question with two GO terms instead of one.

And I am finishing the transfer of this API into BioLink too.

Laurent-Philippe

On Fri, Aug 3, 2018 at 12:52 PM, Suzanna Lewis [email protected] wrote:

Annotations vary considerably in precision, but for conciseness and to determine coverage, we need to be able to answer basic graph traversal questions. For example, given two terms is one of them a subclass of the other? Or what is the closest common parent term of two terms. Right now this functionality is missing and we're dealing with work-arounds or it's completely holding things up.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-api/issues/206, or mute the thread https://github.com/notifications/unsubscribe-auth/AXIGDu4BhjhjpoHfViIgRVj8Hgvv8o8Zks5uNKoMgaJpZM4Vuh_t .

lpalbou avatar Aug 03 '18 20:08 lpalbou

Quite nice, but doesn't really quite fit the bill yet. You would still have to traverse the graph in this JSON structure to answer the simple t/f question of 'is A a subclass of B' or conversely 'is B a subclass of A'. Plus would also be useful to have 'what is the closest parental term shared by A and B'. Burying all of the repetitive traversal stuff down inside the server code.

Be great to have this in BioLink

selewis avatar Aug 03 '18 20:08 selewis

Correct, this query is for general purpose but I should be able to create the two specific queries you mentioned by next week.

lpalbou avatar Aug 03 '18 20:08 lpalbou

Is this just is_a relations?

Also need to know if a term is flagged as 'do_not_manually_annotate' or 'do_not_annotate'

selewis avatar Aug 03 '18 21:08 selewis

In Biolink?

On Fri, Aug 3, 2018 at 1:47 PM lpalbou [email protected] wrote:

Correct, this query is for general purpose but I should be able to create the two specific queries you mentioned by next week.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-api/issues/206#issuecomment-410371959, or mute the thread https://github.com/notifications/unsubscribe-auth/ABcuENQbl7kxnT_F4CsVr_KGOF7TWFlXks5uNLbQgaJpZM4Vuh_t .

selewis avatar Aug 03 '18 21:08 selewis

@selewis Yes, @lpalbou and I had a quick chat.

We can add couple of routes to biolink-api that gives a more direct answer as opposed to the JSON graph normally returned.

deepakunni3 avatar Aug 03 '18 21:08 deepakunni3

@selewis sorry, I am a bit late on this but I have deployed a route this morning to answer your first question:

http://api.geneontology.cloud/association/subclass/{goid1}/{goid2} => return true if and only if goid1 is_a or part_of goid2 (the question is oriented)

I have also deployed a sharedclass route: http://api.geneontology.cloud/association/sharedclass/{goid1}/{goid2} => return the terms (derived from is_a and part_of) that two terms share

To answer the closest common parent of two terms, do you want parents from both is_a and part_of relations ? Note this query could return several parents (example)

I am waiting for a PR on ontobio (https://github.com/biolink/ontobio/pull/217) but if this looks good to you, I'll do a second PR to deploy these routes on BioLink. Following BioLink syntax, they will be mapped respectively to (@cmungall your opinion ?) :

  • /association/between/{goid1}/{goid2}/subclass
  • /association/between/{goid1}/{goid2}/sharedclass

Notes:

  • the /association/between/ route description will need some modification as it was only described for gene and disease associations
  • also, instead of adding /subclass or /sharedclass as path parameters, we could pass them as string parameters to keep the path clean, let me know your preferences
  • behind the scene, I am calling golr with queries such as: http://golr-aux.geneontology.io/solr/select?fq=document_category:%22ontology_class%22&q=*:*&fq=id:%22GO:0030182%22&fl=isa_partof_closure,isa_partof_closure_label&wt=json

lpalbou avatar Aug 20 '18 21:08 lpalbou

Be nice if the first one would provide a way to indicate which relationships to follow. Like Deepak (I think) did for the slimmer code.

For the second, yes return all of them. If possible it would be useful to know the route taken to get there for each of the two children.

selewis avatar Aug 20 '18 21:08 selewis

@selewis I also saw your question about 'do_not_manually_annotate' or 'do_not_annotate' tags. There is no specific route for this question only, but you can see if those tags are present in the subsets section of this general go-term query: https://api.geneontology.cloud/go/GO_0036288

(will be available on BioLink when PRs merged)

lpalbou avatar Aug 20 '18 21:08 lpalbou

@selewis I have updated the API to be more consistent with BioLink syntax and to determine if two terms are related for any of is_a, part_of or regulates relationships:

lpalbou avatar Aug 21 '18 06:08 lpalbou