ontobio
ontobio copied to clipboard
Enrichment notebook not working
https://nbviewer.jupyter.org/github/biolink/ontobio/blob/master/notebooks/Phenotype_Enrichment.ipynb
and binder: https://hub.gke.mybinder.org/user/biolink-ontobio-actww75g/notebooks/notebooks/Phenotype_Enrichment.ipynb
The notebook creates but step 7 doesn't create an enrichment list:
I have tried locally and it's the same thing. I also changed the threshold but it doesn't come from that either.
Possible reason: NCBIGene ids are not found as subjects of the associations ?
@cmungall @deepakunni3
[EDIT]
- the aset variable seems correctly initialized: ontology is here, association_map also with indeed NCBIGene:xxx
- none of the NCBIGene ids provided in the example seems to be in the association_map, so maybe it's just a bad example
- however I did try to create randomly a list of NCBIGene and to attempt an enrichment with a non limiting threshold and set.enrichment_test() is still not giving me results
Let's look when deepak is back
can we use notebooks as unit tests?
On Thu, Sep 19, 2019 at 6:54 PM lpalbou [email protected] wrote:
https://nbviewer.jupyter.org/github/biolink/ontobio/blob/master/notebooks/Phenotype_Enrichment.ipynb
and binder: https://hub.gke.mybinder.org/user/biolink-ontobio-actww75g/notebooks/notebooks/Phenotype_Enrichment.ipynb
The notebook creates but step 7 doesn't create an enrichment list: [image: Screen Shot 2019-09-19 at 6 51 22 PM] https://user-images.githubusercontent.com/24249870/65293025-8cd05780-db0e-11e9-9d95-5a3345e181d7.png
I have tried locally and it's the same thing. I also changed the threshold but it doesn't come from that either.
Possible reason: NCBIGene ids are not found as subjects of the associations ?
@cmungall https://github.com/cmungall @deepakunni3 https://github.com/deepakunni3
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolink/ontobio/issues/376?email_source=notifications&email_token=AAAMMOP3USXWSZF2UZY4LE3QKQUNJA5CNFSM4IYRYET2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMR5QLQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOIV4VDNMZVLBWHGEC3QKQUNJANCNFSM4IYRYETQ .
After digging into the ontobio code and the notebook: the reason why the notebook doesn't work is because the input list used is NCBIGene where as the call to Monarch for gene-phenotype associations returns HGNC (clique leader) as the subject instead of NCBIGene.
The resulting AssociationSet created by Ontobio has associations for HGNC.
At the time of enrichment, these genes are not remapped back to the original form (NCBIGene). Which leads to no enrichment being observed.
@cmungall The proposal to fix this would be to perform remapping after fetching associations in assocmodel.py
or golr_associations.py
.
As an additional point to note - this notebook was originally written when NCBIGene was the clique leader for NCBITaxon:9606, which is why the notebook worked before even though the store that it fetches associations from has changed since.
Of course this gets complicated when an input list contains mix of two separate namespaces.
Shouldn't be an issue. Either you do an initial normalization step and maintain an internal map. Or the remote service returns all synonymous IDs with the payload.
On Thu, Oct 3, 2019 at 3:27 PM Deepak [email protected] wrote:
Of course this gets complicated when an input list contains mix of two separate namespaces.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biolink/ontobio/issues/376?email_source=notifications&email_token=AAAMMOMCZDRFESRRO7BRWVLQMZWWTA5CNFSM4IYRYET2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAJZC3A#issuecomment-538153324, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOL6JAOL2Z5TFW5SO3DQMZWWTANCNFSM4IYRYETQ .