gosling.js Take a closer look at Clinvar VCF file

Looks like Clinvar VCF files are not working with the current VCF loader in Gosling. Might be related to the size of the data.

~~At a minimum, we need to enable specifying chromosome name conventions. The data we tested uses "chr*" while the Clinvar uses "*" w/o "chr" (e.g., Y). But, this does not seem to be the main issue since if I change the convention manually, Gosling still does not load any data.~~

The same data does not seem to work in JBrowse2 as well somehow.

Link to Editor
VCF: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
TBI: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi
Link to JBrowse2: https://jbrowse.org/code/jb2/v1.7.8/?session=share-n2BT-sRNz9&password=XSCvf

May 27 '22 21:05 sehilyi

Another dataset that does not work on Gosling and JBrowse2:

https://s3.amazonaws.com/gosling-lang.org/data/SV/chr1-bc0dee07-de20-44d6-be65-05af7e63ac96.consensus.20160830.somatic.snv_mnv.vcf.gz
https://s3.amazonaws.com/gosling-lang.org/data/SV/chr1-bc0dee07-de20-44d6-be65-05af7e63ac96.consensus.20160830.somatic.snv_mnv.vcf.gz.tbi

Screen Shot 2022-06-24 at 08 48 42

Update: Found two issues: (1) the chromosomes are not sorted, (2) the chromosome names do not use a "chr" prefix

Jun 24 '22 12:06 sehilyi

Turns out that the Clinvar VCF file misses the chr prefix and Gosling was not handling this case well. If I set a custom assembly that excludes the prefix, Gosling correctly loads the data:

{
  "layout": "linear",
  "arrangement": "vertical",
  "centerRadius": 0.8,
  "assembly": [
    ["1", 248956422],
    ["2", 242193529],
    ["3", 198295559],
    ["4", 190214555],
    ["5", 181538259],
    ["6", 170805979],
    ["7", 159345973],
    ["8", 145138636],
    ["9", 138394717],
    ["10", 133797422],
    ["11", 135086622],
    ["12", 133275309],
    ["13", 114364328],
    ["14", 107043718],
    ["15", 101991189],
    ["16", 90338345],
    ["17", 83257441],
    ["18", 80373285],
    ["19", 58617616],
    ["20", 64444167],
    ["21", 46709983],
    ["22", 50818468],
    ["X", 156040895],
    ["Y", 57227415]],
  "xDomain": { "interval": [0, 10000]},
  "views": [
    {
      "tracks": [
        {
          "data": {
            "url": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz",
            "type": "vcf",
            "indexUrl": "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi"
          },
          "mark": "point",
          "x": {"field": "POS", "type": "genomic"},
          "opacity": {"value": 0.9},
          "width": 600,
          "height": 130
        }
      ]
    }
  ]
}

Screen Shot 2022-08-04 at 15 59 54

Wonder if we can infer the chromosome name correctly (chr1 vs. 1). Perhaps, look into the header of the VCF file.

Also, to be able to visualize lollipop plots using this VCF file directly, we will need to enable parsing the INFO column.

// INFO value example
{"ALLELEID":[1493605],"CLNDISDB":["Human_Phenotype_Ontology:HP:0000090","Human_Phenotype_Ontology:HP:0004748","MONDO:MONDO:0019005","MedGen:C0687120","OMIM:PS256100","Orphanet:ORPHA655","SNOMED_CT:204958008"],"CLNDN":["Nephronophthisis"],"CLNHGVS":["NC_000001.11:g.5904754del"],"CLNREVSTAT":["criteria_provided","_single_submitter"],"CLNSIG":["Pathogenic"],"CLNVC":["Deletion"],"CLNVCSO":["SO:0000159"],"GENEINFO":["NPHP4:261734"],"MC":["SO:0001589\|frameshift_variant","SO:0001619\|non-coding_transcript_variant"],"ORIGIN":["1"]}

cc @manzt

Aug 04 '22 20:08 sehilyi

gosling.js gosling.js copied to clipboard

Take a closer look at Clinvar VCF file

gosling.js
gosling.js copied to clipboard