BridgeDb
BridgeDb copied to clipboard
Issues filtering for datasource (in Webservice, in R) and in Cytoscape?
@tabbassidaloii and me have been checking the issues with the xref batch query.
Issue reported by @egonw:
https://webservice.bridgedb.org/Human/xrefs/S/O14494?dataSource=L
this seems to ignore the ?dataSource=L parameter.
Issue reproduced by @tabbassidaloii and @DeniseSl22 ; we believe the parameter is not ignored, but that the SystemCodes are present in the mapping files, but not correctly read in by the BridgeDb libraries.
@tabbassidaloii also tried out different mapping files in R (91, 104, 105, 107) through the BridgeDbR-package (v2.8.0 **rJava_1.0-6; according to GitHub using BridgeDb libraries: [BridgeDb 3.0.19 and Derby 10.15.2]) and all these version give the same issue, when defining the datasource to map to parameter:
map(mapper, "H", "VGF", "L")
Error in .jcall("org/bridgedb/DataSource", "Lorg/bridgedb/DataSource;", :
java.lang.IllegalArgumentException: No DataSource known for the Bioregistry.io prefix L
The first known "bioregistry" addition is mentioned in the BridgeDb 3.0.14 release. @egonw reset the Webservice back to BridgeDb 3.0.13 (which seemed to solve another issue).
We don't know if this issue is related to the other issues we're seeing for the GeneProtein_107 release (not being able to search for HGNC symbol in PV, this does work in 104, not in 105). Our suggestions:
- [ ] 1. Revert the webservice data back to version Ensembl104 (for now) @egonw
- [x] 2. Revert the GeneProtein mapping file download page of BridgeDb back to 104 @tabbassidaloii
- [ ] 3. Download BridgeDb java code (v3.0.13), maven build, update GeneProtein mapping file code with these libraries (Ensembl 104, 107), run code again, check if issue is persistent (check for Hs only for now!)) @tabbassidaloii
- [ ] 4. Revert BridgeDbR code back to older version (using BridgeDb libraries older than 3.0.13) to test if issue above is gone @tabbassidaloii and/or @egonw
- [ ] 5. If previous point solves issues, revert BridgeDbR code back to BridgeDb library 3.0.13 @egonw
- [ ] 6. Start making official releases on GitHub for BridgeDbR code, so we can easily go back to older version @egonw
- [ ] 7. Check with @AlexanderPico if NCBI gene mapping to Ensembl in Cytoscape are (still) an issue, and if our suggestions above solve these @DeniseSl22
@DeniseSl22, @tabbassidaloii, for the "BridgeDb back to 104" step, please update the JSON files accordingly in the https://github.com/bridgedb/data repository
BridgeDb back to 104
The PR is sent
regarding using an older version of BridgeDb (point 3): I have checked the dependencies for creating gene/protein derby files, and I noticed I have not updated that. It is even an older version of BridgeDb (3.0.6). And it has been the same for all the releases (v103 to v107). So that would not cause the problem. What do you @egonw @DeniseSl22 think? I have opened different versions of Hs derby files (v103, 104, and 107) in squirrel and their structures seem to be similar. What else can be checked?
@tabbassidaloii : could you maybe run the script for Hs 107 again, and make sure all the java libraries are version 3.0.13 (check in pom.xml?). I could create a local version of PV 3.3, with new BridgeDb libraries (also 3.0.13), and see if that resolves the issues. If not, we might be looking at this from the wrong perspective (and might need to check Ensembl?), or we would have to go back to an older version of BridgeDb, and than go from there to see what might be causing the issues. In the meantime, I can create a new metabolite mapping file (which uses BridgeDb 3.0.13), and see if I'm getting the same issue in PV regarding the lookup of names.
@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.
@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.
This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?
This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?
Indeed. I will share them on slack.
The issue of not being able to search the database in PV using gene names was because of a minor change we made a while ago to fix an error. But we did not oversee the problem it may cause.
While generating the database for Zm (v52), we got the error below:
Attribute external_gene_name NOT FOUND
To solve this, we changed line 157 in QueryBioMart.java from geneId.setAttribute("name", "external_gene_id");
to
geneId.setAttribute("name", "ensembl_gene_id");
So a search was only possible using Ensembl gene id.
Now I have changed it to
if (config.getSpecies().equals("zmays_eg_gene")) {
geneId.setAttribute("name", "external_gene_name");
else {
geneId.setAttribute("name", "ensembl_gene_id");
}
So the database for species with gene name (external_gene_name) attribute could be searched using gene names.
Thank you for debugging the issue!
Thanks, @tabbassidaloii!