goci
goci copied to clipboard
Mapping pipeline improvements
Based on the previous round of analysis, we concluded that sometimes the Ensembl API fails silently and does not return a response. In the daily mapping, this causes mapping annotated to existing associations for a given variant, to be erased.
I propose the following changes could reduce the incidence of missed mappings. These need verification on the technical side:
- [ ] Only attempt mapping for variants that have an rsID, e.g. rs12345 where 12345 can be any string of numbers
- [ ] For any mapping that returns an error "returned no mapped genes" e.g. "rs2574974 , rs2574974 returned no mapped genes" automatically attempt remapping.
- [x] Do not attempt mapping for variants that already have mapping in the database
We should also follow up with Ensembl to find out if there is a more robust method to query in bulk e.g. by using a local instance of the database.