myvariant.info icon indicating copy to clipboard operation
myvariant.info copied to clipboard

Data source: HGMD

Open kevinxin90 opened this issue 6 years ago • 5 comments

http://www.hgmd.cf.ac.uk/ac/index.php

Seems to be very frequently used by a lot of labs working on variant annotation pipelines.

kevinxin90 avatar Oct 17 '18 23:10 kevinxin90

Have licensing issue we need to be aware of, has an academic version and a commercial version.

kevinxin90 avatar Oct 17 '18 23:10 kevinxin90

We can provide a "data plugin" available for standalone myvariant.info instance when users have the permission to use HGMD. But it won't available in public myvariant.info API due to the license restriction.

newgene avatar Oct 22 '18 17:10 newgene

I think that's the easiest possible route. One thing to remember that HGMD is a product of Qiagen...which also has a stake in annotation distribution as they provide AnnoVar as a service. Interestingly...it was publicly announced just a few weeks ago that Alamut is no longer able to provide HGMD as an annotation....so that's a big reason why labs all over the place are clamoring for a solution to annotation with HGMD included. ....If Qiagen were willing to setup tokenization/authentication with BioThings...there could be an option to provide HGMD-PRO annotation via the public API. But that would take plenty of time and effort. HGMD-PRO as a plugin any institution can just enable is certainly the way to go right now. We could also look at having a 2nd souce HGMD-Public (1/2 the data, and out of date, but at least it's publicly available. Right?)

raymond301 avatar Nov 14 '18 17:11 raymond301

Another possible option is we can provide a parser, but standalone instance users (in-house, not the public one) who do have the HGMD license, can get the dumped HGMD-RPO file and run the parser to merge the data into the standalone instance.

@raymond301 We don't have access to the commercial version of HGMD, do you know if a dumped file will be available for HGMD-PRO subscriber?

Reaching out to Qiagen will be a something we'd like to do soon, just to see if they are open to any solution to include HGMD in MyVariant.info.

newgene avatar Nov 16 '18 19:11 newgene

For HGMD-PRO (Qiagen's commercial product) there are a number of difference licenses, the key differences are between clinical & research purpose as well as their web-interface, or just a data-dump download.

I have "HGMD Download, Research Use" which consists of a number of files:

  • HGMD_Data_Download_Page.pdf
  • HGMD_download_installation_<version>.pdf
  • HGMD_FAQ_<version>.pdf
  • HGMD_Schema_<version>.pdf
  • hgmd_phenbase-<version>.dump.gz
  • hgmd_pro-<version>.dump.gz
  • hgmd_snp-<version>.dump.gz
  • hgmd_views-<version>.dump.gz
  • hgmd_pro_<version>_hg19.vcf
  • hgmd_pro_<version>_hg38.vcf

It's not overly complex to parse and load the VCF's provided...but there are slight differences from the MySQL database dump files, including all the functional annotation & curation notes. So we would be resigned to merge the additional details from the database files, along with the included vcf files. Please note that this file structure & formats have changed over the years...so all of this is subject to change based on release version, which is every quarter of the year.

It's a very doable task...I cannot speak for Qiagen's position on inclusivity into MyVariant.info. But it may be worth looking into what can be obtained through their public version.

raymond301 avatar Nov 19 '18 14:11 raymond301