Add more options for variant search
Currently, variants can be searched for by chrom-pos-ref-alt ID or dbSNP rsID.
Add support for additional identifiers such as ClinGen CAID, HGVS, etc.
The ClinGen Allele Registry API (docs) seems to only be useful for gnomAD v2.
Requesting a variant in gnomAD by CA ID returns a gnomAD object in the "externalRecords" field. However, based on the "Supported searches" help text, ClinGen Allele Registry defines gnomAD IDs as hg19/GRCh37.
curl 'https://reg.clinicalgenome.org/allele/CA034587' | jq
...
"gnomAD": [
{
"@id": "http://gnomad.broadinstitute.org/variant/1-55505610-G-A",
"id": "1-55505610-G-A",
"variant": "1:55505610 G / A"
}
]
...
It also returns a "genomicAlleles" field with chromosome and coordinates in different reference genomes. This looked promising at first glance, but their GRCh37 and GRCh38 coordinates do not always match up with gnomAD's.
For example,
curl 'https://reg.clinicalgenome.org/allele/CA871469' | jq
...
"gnomAD": [
{
"@id": "http://gnomad.broadinstitute.org/variant/1-55505552-A-ACTG",
"id": "1-55505552-A-ACTG",
"variant": "1:55505552 A / ACTG"
}
]
...
"genomicAlleles": [
{
"chromosome": "1",
"coordinates": [
{
"allele": "CTG",
"end": 55039902,
"referenceAllele": "",
"start": 55039902
}
],
"hgvs": [
"NC_000001.11:g.55039900_55039902dup",
"CM000663.2:g.55039900_55039902dup"
],
"referenceGenome": "GRCh38",
"referenceSequence": "http://reg.genome.network/refseq/RS000049"
},
{
"chromosome": "1",
"coordinates": [
{
"allele": "CTG",
"end": 55505575,
"referenceAllele": "",
"start": 55505575
}
],
"hgvs": [
"NC_000001.10:g.55505573_55505575dup",
"CM000663.1:g.55505573_55505575dup"
],
"referenceGenome": "GRCh37",
"referenceSequence": "http://reg.genome.network/refseq/RS000025"
},
Based on this variant's ClinVar variation ID (235043), it corresponds to 1-55505552-A-ACTG in gnomAD v2 and 1-55039879-A-ACTG in gnomAD v3.
The returned gnomAD ID matches, but the coordinates in "genomicAlleles" do not.
Searching by the gnomAD v3 coordinates returns the same variant though.
curl 'https://reg.clinicalgenome.org/allele?hgvs=NC_000001.11:g.55039879_55039880insCTG' | jq
...
"@id": "http://reg.genome.network/allele/CA871469",
...
So this seems like it might be an issue with different methods of aligning a variant.
To search by CA ID, it looks like we'll have to fetch CA IDs for all gnomAD variants and store them in our database.
The above issues with returning only gnomAD v2 IDs and the different coordinates would also hinder using the ClinGen Allele Registry API for searching by HGVS notation.
The mismatch between coordinates in gnomAD IDs and genomicAlleles in the ClinGen Allele Registry responses are due to different alignments (left vs right). So, with the reference genome, it is possible to convert the genomicAlleles values to gnomAD IDs.
It seems that, in some cases where a variant can be represented in multiple ways, the Allele Registry API returns coordinates and alleles that do not match.
For example, CA871469 includes in genomicAlleles:
{
"chromosome": "1",
"coordinates": [
{
"allele": "CTG",
"end": 55505575,
"referenceAllele": "",
"start": 55505575
}
],
"hgvs": [
"NC_000001.10:g.55505573_55505575dup",
"CM000663.1:g.55505573_55505575dup"
],
"referenceGenome": "GRCh37",
"referenceSequence": "http://reg.genome.network/refseq/RS000025"
}
However, the allele in coordinates (CTG) does not match the allele indicated by the HGVS expression. A duplication of positions 55505573-55505575 would be GCT.
Based on the MyVariantInfo_hg19 field in externalRecords:
"MyVariantInfo_hg19": [
{
"@id": "http://myvariant.info/v1/variant/chr1:g.55505573_55505575dupGCT?assembly=hg19",
"id": "chr1:g.55505573_55505575dupGCT"
},
{
"@id": "http://myvariant.info/v1/variant/chr1:g.55505558_55505559insCTG?assembly=hg19",
"id": "chr1:g.55505558_55505559insCTG"
},
{
"@id": "http://myvariant.info/v1/variant/chr1:g.55505555_55505556insCTG?assembly=hg19",
"id": "chr1:g.55505555_55505556insCTG"
},
{
"@id": "http://myvariant.info/v1/variant/chr1:g.55505552_55505553insCTG?assembly=hg19",
"id": "chr1:g.55505552_55505553insCTG"
}
]
It appears that an alternate allele of CTG matches some other possible coordinates, but not 55505575.
I recently spoke with a genetic variant analyst who expressed how nice a quality of live feature it would be if gnomAD supported searching variants by HGVS.
As such, I'm bumping this issue to triage so that it gets discussed on this upcoming Thursday's browser meeting.