gosling.js
gosling.js copied to clipboard
Take a closer look at Clinvar VCF file
Looks like Clinvar VCF files are not working with the current VCF loader in Gosling. Might be related to the size of the data.
~~At a minimum, we need to enable specifying chromosome name conventions. The data we tested uses "chr*" while the Clinvar uses "*" w/o "chr" (e.g., Y
). But, this does not seem to be the main issue since if I change the convention manually, Gosling still does not load any data.~~
The same data does not seem to work in JBrowse2 as well somehow.
-
VCF: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
-
TBI: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
-
Link to JBrowse2: https://jbrowse.org/code/jb2/v1.7.8/?session=share-n2BT-sRNz9&password=XSCvf
data:image/s3,"s3://crabby-images/bdcdb/bdcdb1f60fee30f044c5470939d4a026030ecfc3" alt="Screen Shot 2022-05-27 at 17 27 04"
Another dataset that does not work on Gosling and JBrowse2:
- https://s3.amazonaws.com/gosling-lang.org/data/SV/chr1-bc0dee07-de20-44d6-be65-05af7e63ac96.consensus.20160830.somatic.snv_mnv.vcf.gz
- https://s3.amazonaws.com/gosling-lang.org/data/SV/chr1-bc0dee07-de20-44d6-be65-05af7e63ac96.consensus.20160830.somatic.snv_mnv.vcf.gz.tbi
Update: Found two issues: (1) the chromosomes are not sorted, (2) the chromosome names do not use a "chr" prefix
Turns out that the Clinvar VCF file misses the chr
prefix and Gosling was not handling this case well. If I set a custom assembly that excludes the prefix, Gosling correctly loads the data:
{
"layout": "linear",
"arrangement": "vertical",
"centerRadius": 0.8,
"assembly": [
["1", 248956422],
["2", 242193529],
["3", 198295559],
["4", 190214555],
["5", 181538259],
["6", 170805979],
["7", 159345973],
["8", 145138636],
["9", 138394717],
["10", 133797422],
["11", 135086622],
["12", 133275309],
["13", 114364328],
["14", 107043718],
["15", 101991189],
["16", 90338345],
["17", 83257441],
["18", 80373285],
["19", 58617616],
["20", 64444167],
["21", 46709983],
["22", 50818468],
["X", 156040895],
["Y", 57227415]],
"xDomain": { "interval": [0, 10000]},
"views": [
{
"tracks": [
{
"data": {
"url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz",
"type": "vcf",
"indexUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi"
},
"mark": "point",
"x": {"field": "POS", "type": "genomic"},
"opacity": {"value": 0.9},
"width": 600,
"height": 130
}
]
}
]
}
Wonder if we can infer the chromosome name correctly (chr1
vs. 1
). Perhaps, look into the header of the VCF file.
Also, to be able to visualize lollipop plots using this VCF file directly, we will need to enable parsing the INFO
column.
// INFO value example
{"ALLELEID":[1493605],"CLNDISDB":["Human_Phenotype_Ontology:HP:0000090","Human_Phenotype_Ontology:HP:0004748","MONDO:MONDO:0019005","MedGen:C0687120","OMIM:PS256100","Orphanet:ORPHA655","SNOMED_CT:204958008"],"CLNDN":["Nephronophthisis"],"CLNHGVS":["NC_000001.11:g.5904754del"],"CLNREVSTAT":["criteria_provided","_single_submitter"],"CLNSIG":["Pathogenic"],"CLNVC":["Deletion"],"CLNVCSO":["SO:0000159"],"GENEINFO":["NPHP4:261734"],"MC":["SO:0001589\|frameshift_variant","SO:0001619\|non-coding_transcript_variant"],"ORIGIN":["1"]}
cc @manzt