goci icon indicating copy to clipboard operation
goci copied to clipboard

Mapping pipeline improvements

Open ljwh2 opened this issue 9 months ago • 1 comments

Based on the previous round of analysis, we concluded that sometimes the Ensembl API fails silently and does not return a response. In the daily mapping, this causes mapping annotated to existing associations for a given variant, to be erased.

I propose the following changes could reduce the incidence of missed mappings. These need verification on the technical side:

  • [ ] Only attempt mapping for variants that have an rsID, e.g. rs12345 where 12345 can be any string of numbers
  • [ ] For any mapping that returns an error "returned no mapped genes" e.g. "rs2574974 , rs2574974 returned no mapped genes" automatically attempt remapping.
  • [x] Do not attempt mapping for variants that already have mapping in the database

We should also follow up with Ensembl to find out if there is a more robust method to query in bulk e.g. by using a local instance of the database.

ljwh2 avatar May 15 '24 15:05 ljwh2