biolink-model icon indicating copy to clipboard operation
biolink-model copied to clipboard

Add an association type for organism-organism or species-species interactions

Open cmungall opened this issue 5 years ago • 15 comments

Note related to translator but would be good to have a standard biolink compliant way of stating globi graphs cc @diatomsRcool. I will explain this later @jhpoelen

cmungall avatar Jul 01 '19 21:07 cmungall

@diatomsRcool you probably had no idea this was assigned to you!

nlharris avatar Apr 28 '21 04:04 nlharris

I did not! and I do need this now, so I'll start on it.

diatomsRcool avatar Apr 28 '21 13:04 diatomsRcool

We definitely need a predicate to represent interactions between organisms/taxa. We may also need a way to relate taxa to each other based on taxonomy or phylogeny. I'm only saying this because I suspect it might be worth using a taxonomy or phylogeny to propagate data from a highly observed taxon to a sparsely observed taxon with an uncertainty measure based on the taxonomic or phylogenetic relationship. Just thinking out loud here.

diatomsRcool avatar Apr 28 '21 17:04 diatomsRcool

note: we have an association for taxa->taxa:


  taxon to taxon association:
    is_a: association
    defining_slots:
      - subject
      - predicate
      - object
    slot_usage:
      subject:
        range: organism taxon
      relation:
        values_from:
          - ro
      object:
        range: organism taxon
        description: >-
          An association between individuals of different taxa.

sierra-moxon avatar Oct 25 '21 22:10 sierra-moxon

@sierra-moxon very neat! If you'd like, I can add support for your biolink format in GloBI and generate related data products automatically.

Do you have some examples I can work with?

jhpoelen avatar Oct 25 '21 22:10 jhpoelen

Chatting with @sierra-moxon

the current docs on t2t association is a bit confusing. It says between individuals but the D+R is taxa

Recall that bl distinguishes between

So this leads to an increase in the number of association types, e.g

  1. Chris, an instance of Homo sapiens, lives with Victor, an instance of Felis catus)
  2. Chris likes cats
  3. Humans and cats may form symbiotic relationships

Globi can capture all 3 and may for example capture edges of type 1 as evidence for edges of type 2 or 3. This of course mirrors many of the discussions in other parts of biolink/translator.

@jhpoelen support in globi would be great. Biolink can be expressed using different formats, and in fact the kgx serialization probably looks like like the existing globi TSV

cmungall avatar Oct 26 '21 00:10 cmungall

@cmungall I had a suspicion you live with cats . . . ; )

Please provide specific examples for biolink compatible format with use-case. Ideally, I'd have a way to verify that the data is valid and enable reuse somehow.

jhpoelen avatar Oct 26 '21 00:10 jhpoelen

@jhpoelen are you asking for a use case that justifies representing interaction data in a biolink compliant format?

diatomsRcool avatar Oct 26 '21 13:10 diatomsRcool

@diatomsRcool Yes, a use case that shows how/when to use the biolink compliant formatted data would be nice. In my experience, data products with fancy formats are often left unused because folks don't know that they exist or don't know how/when to use them.

jhpoelen avatar Oct 26 '21 14:10 jhpoelen

btw @diatomsRcool @sierra-moxon @cmungall happy to chat more if desired. Just want to get a sense for what kinf of use y'all are envisioning.

jhpoelen avatar Oct 26 '21 16:10 jhpoelen

ok, got it. My dream, if you will, is to have a hypothesis generator for ecology. Take a look at slide 3 in the talk at the link below. Ecology, of course, is very complicated and species interactions are integral to understanding what is going on. Does this make sense? https://doi.org/10.5281/zenodo.5104203

diatomsRcool avatar Oct 27 '21 15:10 diatomsRcool

@diatomsRcool Thanks for sharing your dream, and yes, it make sense. Very exciting!

Please let me know what kind of GloBI data product you'd need to help work towards your vision.

A detailed example of two, in the exact format would help to start producing the desired data product sooner rather than later.

jhpoelen avatar Oct 27 '21 16:10 jhpoelen

@diatomsRcool I am sure you have a lot on your plate, so i'll patiently wait for examples re: to your biolink use-case. Meanwhile, I'll assume the project is not yet in a state that allow non-biolink insiders to benefit from your inspired work.

jhpoelen avatar Nov 04 '21 18:11 jhpoelen

Sorry. I think that if there is a GloBI data product that lists the EOL ID and/or the NCBITaxon ID of the two taxa interacting and the RO term for the interaction, that would be the bare minimum. This data product should be a file that can be downloaded via a URL. I don't quite remember all the different metadata you have in GloBI for each interaction. So, while the taxon IDs and RO terms are the bare minimum, all the accompanying metadata would also likely be useful. Does that help?

diatomsRcool avatar Nov 04 '21 20:11 diatomsRcool

@diatomsRcool thanks for clarifying.

As far as I understand, you'd do the conversion on your end, and existing data products like interactions.tsv.gz would work ok.

e.g.,

to get just list of taxon ids with associated RO term:

$ curl "https://zenodo.org/record/4460654/files/interactions.tsv.gz"\
 | gunzip\
 | mlr --tsvlite cut -f targetTaxonIds,interactionTypeId,sourceTaxonIds\
 | head -n2
GBIF:1357746 | INAT_TAXON:198981 | IRMNG:10602220 | ITIS:654388 | NCBI:586963 | OTT:821172 | http://treatment.plazi.org/id/0392879B737BAB2943D5FB65FA89FA8E | http://treatment.plazi.org/id/8A998764149F7E00E335FFE1BCB17FE1	http://purl.obolibrary.org/obo/RO_0002622	GBIF:3034521 | INAT_TAXON:83154 | IRMNG:10203376 | ITIS:29906 | NCBI:40947 | OTT:428216 | http://treatment.plazi.org/id/B87F34AA383C06E0CF4F70EC89CBEF72
sourceTaxonIds interactionTypeId targetTaxonIds
GBIF:1357746 | INAT_TAXON:198981 | IRMNG:10602220 | ITIS:654388 | NCBI:586963 | OTT:821172 | http://treatment.plazi.org/id/0392879B737BAB2943D5FB65FA89FA8E | http://treatment.plazi.org/id/8A998764149F7E00E335FFE1BCB17FE1 http://purl.obolibrary.org/obo/RO_0002622 GBIF:3034521 | INAT_TAXON:83154 | IRMNG:10203376 | ITIS:29906 | NCBI:40947 | OTT:428216 | http://treatment.plazi.org/id/B87F34AA383C06E0CF4F70EC89CBEF72
GBIF:1358021 | IRMNG:10940518 | ITIS:654371 | NCBI:586961 | OTT:821170 | http://treatment.plazi.org/id/0392879B737BAB2943D5FC0BFE04FBF9 http://purl.obolibrary.org/obo/RO_0002622 GBIF:3190089 | INAT_TAXON:54836 | IRMNG:11084524 | ITIS:505788 | NCBI:354526 | OTT:760765

But . . . there much more metadata available:

$ curl --silent "https://zenodo.org/record/4460654/files/interactions.tsv.gz"\
 | gunzip\
 | head -n1\
 | tr '\t' '\n'\
 | grep -n ".*"
1:sourceTaxonId
2:sourceTaxonIds
3:sourceTaxonName
4:sourceTaxonRank
5:sourceTaxonPathNames
6:sourceTaxonPathIds
7:sourceTaxonPathRankNames
8:sourceTaxonSpeciesName
9:sourceTaxonSpeciesId
10:sourceTaxonGenusName
11:sourceTaxonGenusId
12:sourceTaxonFamilyName
13:sourceTaxonFamilyId
14:sourceTaxonOrderName
15:sourceTaxonOrderId
16:sourceTaxonClassName
17:sourceTaxonClassId
18:sourceTaxonPhylumName
19:sourceTaxonPhylumId
20:sourceTaxonKingdomName
21:sourceTaxonKingdomId
22:sourceId
23:sourceOccurrenceId
24:sourceCatalogNumber
25:sourceBasisOfRecordId
26:sourceBasisOfRecordName
27:sourceLifeStageId
28:sourceLifeStageName
29:sourceBodyPartId
30:sourceBodyPartName
31:sourcePhysiologicalStateId
32:sourcePhysiologicalStateName
33:sourceSexId
34:sourceSexName
35:interactionTypeName
36:interactionTypeId
37:targetTaxonId
38:targetTaxonIds
39:targetTaxonName
40:targetTaxonRank
41:targetTaxonPathNames
42:targetTaxonPathIds
43:targetTaxonPathRankNames
44:targetTaxonSpeciesName
45:targetTaxonSpeciesId
46:targetTaxonGenusName
47:targetTaxonGenusId
48:targetTaxonFamilyName
49:targetTaxonFamilyId
50:targetTaxonOrderName
51:targetTaxonOrderId
52:targetTaxonClassName
53:targetTaxonClassId
54:targetTaxonPhylumName
55:targetTaxonPhylumId
56:targetTaxonKingdomName
57:targetTaxonKingdomId
58:targetId
59:targetOccurrenceId
60:targetCatalogNumber
61:targetBasisOfRecordId
62:targetBasisOfRecordName
63:targetLifeStageId
64:targetLifeStageName
65:targetBodyPartId
66:targetBodyPartName
67:targetPhysiologicalStateId
68:targetPhysiologicalStateName
69:targetSexId
70:targetSexName
71:decimalLatitude
72:decimalLongitude
73:localityId
74:localityName
75:eventDateUnixEpoch
76:argumentTypeId
77:referenceCitation
78:referenceDoi
79:referenceUrl
80:sourceCitation
81:sourceNamespace
82:sourceArchiveURI
83:sourceDOI
84:sourceLastSeenAtUnixEpoch

Note that for sake of simplicity, not all of the linked taxonomies in {source|target}TaxonIds are included in the expanded fields (e.g., sourceTaxonId, sourcePathIds).

Hope this helps you to access the minimal information you need.

If you need more information, please do holler, as I realize that the text above might not be as informative to others as it is to me.

jhpoelen avatar Nov 05 '21 17:11 jhpoelen