open-cravat icon indicating copy to clipboard operation
open-cravat copied to clipboard

open-cravat octopus vs strelka

Open cariaso opened this issue 4 years ago • 15 comments

Similar to #63 but not hg19 specific. I have a BAM on which I've used both

https://github.com/luntergroup/octopus and https://github.com/Illumina/strelka

They agree on what they found (deletion of TG & deletion of TGTG) but disagree on how to represent that in the output. In particular

strelka says

chr7    74053320        .       CTGTG   CTG,C   278     PASS    CIGAR=1M2D2M,1M4D;RU=TG,TG;REFREP=19,19;IDREP=18,17;MQ=60       GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL  1/2:65:19:36:1,12,9:0,2,1:1,10,8:PASS:282,102,66,142,0,123

while octopus instead writes that as 2 rows

chr7  74053320        .       CTG     C,*     266.45  PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    2|1:93:37:59:74053320:100:PASS
chr7  74053320        .       CTGTG   C,*     48.37   PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    1|2:48:37:59:74053320:100:PASS

the octopus team seems to view their representation as a feature, not a bug https://github.com/luntergroup/octopus#output-format

however numerous parts of opencravat behave quite differently when processing these. Here are some of the differences I've noted.

{
  "base__all_mappings": [
    "{\"ELN\": [[\"P15502\", \"\", \"INT\", \"ENST00000252034.12\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000320399.10\", \"c.1096+47_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000320492.11\", \"c.988+47_988+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000357036.9\", \"c.1111+47_1111+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000358929.8\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380553.8\", \"c.745+47_745+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380562.8\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380575.8\", \"c.1066+47_1066+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380576.9\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380584.8\", \"c.1054+47_1054+50del\"], [\"\", \"\", \"INT\", \"ENST00000414324.5\", \"c.1081+47_1081+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000429192.5\", \"c.1111+47_1111+50del\"], [\"\", \"\", \"INT\", \"ENST00000445912.5\", \"c.1096+47_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000458204.5\", \"c.1066+47_1066+50del\"], [\"\", \"\", \"INT,PTR\", \"ENST00000466878.5\", \"\"], [\"\", \"\", \"INT\", \"ENST00000621115.4\", \"c.964+47_964+50del\"]]}",
    "{\"ELN\": [[\"P15502\", \"\", \"INT\", \"ENST00000252034.12\", \"c.1096+49_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000320399.10\", \"c.1096+49_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000320492.11\", \"c.988+49_988+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000357036.9\", \"c.1111+49_1111+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000358929.8\", \"c.1096+49_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380553.8\", \"c.745+49_745+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380562.8\", \"c.1096+49_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380575.8\", \"c.1066+49_1066+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380576.9\", \"c.1096+49_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380584.8\", \"c.1054+49_1054+50del\"], [\"\", \"\", \"INT\", \"ENST00000414324.5\", \"c.1081+49_1081+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000429192.5\", \"c.1111+49_1111+50del\"], [\"\", \"\", \"INT\", \"ENST00000445912.5\", \"c.1096+49_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000458204.5\", \"c.1066+49_1066+50del\"], [\"\", \"\", \"INT,PTR\", \"ENST00000466878.5\", \"\"], [\"\", \"\", \"INT\", \"ENST00000621115.4\", \"c.964+49_964+50del\"]]}",
    "{\"ELN\": [[\"P15502\", \"\", \"INT\", \"ENST00000252034.12\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000320399.10\", \"c.1096+47_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000320492.11\", \"c.988+47_988+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000357036.9\", \"c.1111+47_1111+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000358929.8\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380553.8\", \"c.745+47_745+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380562.8\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380575.8\", \"c.1066+47_1066+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380576.9\", \"c.1096+47_1096+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000380584.8\", \"c.1054+47_1054+50del\"], [\"\", \"\", \"INT\", \"ENST00000414324.5\", \"c.1081+47_1081+50del\"], [\"P15502\", \"\", \"INT\", \"ENST00000429192.5\", \"c.1111+47_1111+50del\"], [\"\", \"\", \"INT\", \"ENST00000445912.5\", \"c.1096+47_1096+50del\"], [\"\", \"\", \"INT\", \"ENST00000458204.5\", \"c.1066+47_1066+50del\"], [\"\", \"\", \"INT,PTR\", \"ENST00000466878.5\", \"\"], [\"\", \"\", \"INT\", \"ENST00000621115.4\", \"c.964+47_964+50del\"]]}"
  ],
  "base__cchange": [
    "c.1096+47_1096+50del",
    "c.1096+49_1096+50del",
    "c.1096+47_1096+50del"
  ],
  "base__ref_base": [
    "TGTG",
    "TG",
    "TGTG"
  ],
  "base__uid": [
    2292288,
    2229240,
    2229241
  ],
  "clinvar__disease_names": [
    "not specified",
    "Supravalvar aortic stenosis|Cutis laxa, autosomal dominant",
    "not specified"
  ],
  "clinvar__disease_refs": [
    "MedGen:CN169374",
    "Human Phenotype Ontology:HP:0004381,MONDO:MONDO:0008504,MedGen:C0003499,OMIM:185500,Orphanet:ORPHA3193,SNOMED CT:268185002|MONDO:MONDO:0019571,MedGen:C0268350,Orphanet:ORPHA90348,SNOMED CT:111388003",
    "MedGen:CN169374"
  ],
  "clinvar__id": [
    "402825",
    "360646",
    "402825"
  ],
  "clinvar__sig": [
    "Benign",
    "Uncertain significance",
    "Benign"
  ],
  "extra_vcf_info__AC": [
    null,
    1,
    1
  ],
  "extra_vcf_info__AN": [
    null,
    2,
    2
  ],
  "extra_vcf_info__CIGAR": [
    "1M4D",
    null,
    null
  ],
  "extra_vcf_info__DP": [
    null,
    37,
    37
  ],
  "extra_vcf_info__IDREP": [
    17,
    null,
    null
  ],
  "extra_vcf_info__MQ": [
    60,
    59,
    59
  ],
  "extra_vcf_info__NS": [
    null,
    1,
    1
  ],
  "extra_vcf_info__REFREP": [
    19,
    null,
    null
  ],
  "extra_vcf_info__RU": [
    "TG",
    null,
    null
  ],
  "extra_vcf_info__ref": [
    "CTGTG",
    "CTG",
    "CTGTG"
  ],
  "gnomad3__af": [
    0.543651,
    0.0472857,
    0.543651
  ],
  "gnomad3__af_afr": [
    0.48686,
    0.0873048,
    0.48686
  ],
  "gnomad3__af_asj": [
    0.604372,
    0.0314233,
    0.604372
  ],
  "gnomad3__af_eas": [
    0.569697,
    0.054698,
    0.569697
  ],
  "gnomad3__af_fin": [
    0.573476,
    0.0422155,
    0.573476
  ],
  "gnomad3__af_lat": [
    0.602031,
    0.0405933,
    0.602031
  ],
  "gnomad3__af_nfe": [
    0.558495,
    0.025235,
    0.558495
  ],
  "gnomad3__af_oth": [
    0.55107,
    0.0501949,
    0.55107
  ],
  "gnomad3__af_sas": [
    0.547005,
    0.044838,
    0.547005
  ],
  "vcfinfo__af": [
    "0.4090909090909091",
    null,
    null
  ],
  "vcfinfo__alt_reads": [
    "9",
    null,
    null
  ],
  "vcfinfo__phred": [
    "278",
    "266.45",
    "48.37"
  ],
  "vcfinfo__tot_reads": [
    "22",
    "37",
    "37"
  ]
}```


Does the opencravat team feel that octopus is "just plain wrong"? "not supported"? "cutting edge"? something else?
Do these different opencravat outputs reflect a limitation of opencravat?
If so, does it seem fixable? or is it an unavoidable limitation of opencravat architecture?

cariaso avatar Apr 15 '21 07:04 cariaso

am I going crazy? there was useful comment here from @rkimoakbioinformatics about the 3' rule https://varnomen.hgvs.org/recommendations/checklist/#:~:text=The%203'%20rule%20%2D%20do%20you,amino%20acid

cariaso avatar Apr 15 '21 23:04 cariaso

@cariaso Sorry, I deleted my comments because I was afraid of possibly giving confusing information and wanted to write a clearer one again. I'll write below.

OpenCRAVAT normalizes input variants with the 3' rule and then feed the normalized variants to annotators. Thus,

  • The first input: strelka input chr7:74053320:CTGTG>CTG becomes chr7:74053323:TGdel after normalization with the 3' rule. octopus input chr7:74053320:CTG>C becomes chr7:74053321:TGdel after normalization with the 3' rule. Although these two variants look different after normalization with the 3' rule, when the hg38 mapper processes them, the mapper looks at the bases around the variants to apply the 3' rule further. This step by the mapper ensures that correct HGVS-format consequences are found. As a result, OpenCRAVAT produces the same variant consequences for the both variants (e.g. ENST00000252034.12 intron_variant c.1096+49_1096+50del).
  • The second input: strelka input chr7:74053320:CTGTG>C becomes chr7:74053321:TGTGdel after normalization with the 3' rule. octopus input chr7:74053320:CTGTG>C becomes chr7:74053321:TGTGdel after normalization with the 3' rule. Both are the same and OpenCRAVAT produces the same variant consequences (e.g. ENST00000252034.12 intron_variant c.1096+47_1096+50del).

As I said, I am not sure if this is an answer to your question.

rkimoakbioinformatics avatar Apr 16 '21 03:04 rkimoakbioinformatics

I am not sure if this is an answer to your question.

It's not a final answer, but it definitely helps. I'm finding it a bit difficult to keep these straight, so I'm assigning names S1, S2, O1, O2 and attempting to restate what you've written. I'm not agreeing or disagreeing.

strelka VCF:

chr7    74053320        .       CTGTG   CTG,C   278     PASS    CIGAR=1M2D2M,1M4D;RU=TG,TG;REFREP=19,19;IDREP=18,17;MQ=60       GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL  1/2:65:19:36:1,12,9:0,2,1:1,10,8:PASS:282,102,66,142,0,123

opencravat 3' normalized strelka: S1 chr7:74053320:CTGTG>CTG becomes chr7:74053323:TGdel S2 chr7:74053320:CTGTG>C becomes chr7:74053321:TGTGdel

octopus VCF:

chr7  74053320        .       CTG     C,*     266.45  PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    2|1:93:37:59:74053320:100:PASS
chr7  74053320        .       CTGTG   C,*     48.37   PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    1|2:48:37:59:74053320:100:PASS

opencravat 3' normalized octopus: O1 chr7:74053320:CTG>C becomes chr7:74053321:TGdel O2 chr7:74053320:CTGTG>C becomes chr7:74053321:TGTGdel

Although S1 & O1 look different after normalization with the 3' rule, when the hg38 mapper processes them, the mapper looks at the bases around the variants to apply the 3' rule further. This step by the mapper ensures that correct HGVS-format consequences are found. As a result, OpenCRAVAT produces the same variant consequences for the both variants (e.g. ENST00000252034.12 intron_variant c.1096+49_1096+50del).

S2 & O2 are the same and OpenCRAVAT produces the same variant consequences (e.g. ENST00000252034.12 intron_variant c.1096+47_1096+50del).

cariaso avatar Apr 16 '21 05:04 cariaso

I see clear evidence of octopus working as you've described, with O1 & O2 as expected as 2 rows in the variant table. However for Strelka I'm only seeing evidence of S2.

The only entry in the variant table near this location looks like this:

  base__uid = 2292288
  base__chrom = chr7
  base__pos = 74053321
  base__ref_base = TGTG
  base__alt_base = -
  base__hugo = ELN
  base__transcript = ENST00000252034.12
  base__so = INT
  base__cchange = c.1096+47_1096+50del
  base__all_mappings = {"ELN": [["P15502", "", "INT", "ENST00000252034.12", "c.1096+47_1096+50del"], ["P15502", "", "INT", "ENST00000320399.10", "c.1096+47_1096+50del"], ["", "", "INT", "ENST00000320492.11", "c.988+47_988+50del"], ["P15502", "", "INT", "ENST00000357036.9", "c.1111+47_1111+50del"], ["P15502", "", "INT", "ENST00000358929.8", "c.1096+47_1096+50del"], ["P15502", "", "INT", "ENST00000380553.8", "c.745+47_745+50del"], ["P15502", "", "INT", "ENST00000380562.8", "c.1096+47_1096+50del"], ["P15502", "", "INT", "ENST00000380575.8", "c.1066+47_1066+50del"], ["P15502", "", "INT", "ENST00000380576.9", "c.1096+47_1096+50del"], ["P15502", "", "INT", "ENST00000380584.8", "c.1054+47_1054+50del"], ["", "", "INT", "ENST00000414324.5", "c.1081+47_1081+50del"], ["P15502", "", "INT", "ENST00000429192.5", "c.1111+47_1111+50del"], ["", "", "INT", "ENST00000445912.5", "c.1096+47_1096+50del"], ["", "", "INT", "ENST00000458204.5", "c.1066+47_1066+50del"], ["", "", "INT,PTR", "ENST00000466878.5", ""], ["", "", "INT", "ENST00000621115.4", "c.964+47_964+50del"]]}
  clinvar__sig = Benign
  clinvar__disease_refs = MedGen:CN169374
  clinvar__disease_names = not specified
  clinvar__rev_stat = criteria provided, single submitter
  clinvar__id = 402825 
  extra_vcf_info__pos = 74053320
  extra_vcf_info__ref = CTGTG
  extra_vcf_info__alt = C
  extra_vcf_info__CIGAR = 1M4D
  extra_vcf_info__RU = TG
  extra_vcf_info__REFREP = 19
  extra_vcf_info__IDREP = 17
  extra_vcf_info__MQ = 60
  gnomad3__af = 0.543651
  gnomad3__af_afr = 0.48686
  gnomad3__af_asj = 0.604372
  gnomad3__af_eas = 0.569697
  gnomad3__af_fin = 0.573476
  gnomad3__af_lat = 0.602031
  gnomad3__af_nfe = 0.558495
  gnomad3__af_oth = 0.55107
  gnomad3__af_sas = 0.547005
  vcfinfo__phred = 278
  vcfinfo__filter = PASS
  vcfinfo__zygosity = het
  vcfinfo__alt_reads = 9
  vcfinfo__tot_reads = 22
  vcfinfo__af = 0.4090909090909091
  tagsampler__numsample = 1
  tagsampler__samples = HG02219
  tagsampler__tags = 

Should there be 1 or 2 entries in the variant table? If only 1, how should it represent S1 & S2?

cariaso avatar Apr 16 '21 10:04 cariaso

There should be 2 entries in the variant table. The following is the test with which I saw two entries. The input and the tsv report file are attached.

oc run strelka.vcf --skip annotator -t tsv

In the tsv file and also with oc gui strelka.vcf.sqlite, two entries are in the result. By the way, vcf-converter version was 2.0.2.

strelka.zip

rkimoakbioinformatics avatar Apr 16 '21 13:04 rkimoakbioinformatics

There should be 2 entries in the variant table.

And indeed there are. I'd failed to notice one of them, because there remains a crucial difference between the results of strelka and octopus.

For strelka the results are just as you say, with 1 variant at base__pos = 74053323 and the other at base__pos = 74053321

however when the same process is repeated with a VCF files based on the 2 octopus VCF lines I provided above, there are again 2 variants, however they both have the same base_pos = 74053321

Can you please repeat your oc run command from the prior comment, with this octopus VCF:

chr7  74053320        .       CTG     C,*     266.45  PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    2|1:93:37:59:74053320:100:PASS
chr7  74053320        .       CTGTG   C,*     48.37   PASS    AC=1,1;AN=2;DP=37;MQ=59;NS=1    GT:GQ:DP:MQ:PS:PQ:FT    1|2:48:37:59:74053320:100:PASS

Do you see what I see, 2 variants with different base__pos? If so, equivalent inputs produce variants reported at different locations, with secondary effects on clinvar__disease_names and numerous other fields.

cariaso avatar Apr 18 '21 17:04 cariaso

I see the following from an oc run with your two octopus VCF format variants:

1   chr7    74053321    TG  -
2   chr7    74053321    TGTG    -

There are a few things here:

  • strelka and octopus actually give different decision on the first variant. strelka says "chr7 74053320 CTGTG CTG" meaning that positions 74053323 to 74053324 were deleted, while octopus says "chr7 74053320 CTG C" meaning that positions 74053321 to 74053322 were deleted.

  • Before mapping to transcripts, OpenCRAVAT does its best to standardize input position, reference allele, and alternate allele within the limit of what is given as input. Thus, at this stage, OpenCRAVAT does not know that whether the 3' rule can be further applied to chr7 74053321 TG - or not. To be fair, as far as I know, no other tool inquires that, either.

  • Further complication is that, if we follow HGVS's 3' rule, strelka's chr7 74053320 CTGTG CTG and octopus' chr7 74053320 CTG C are both wrong. There are 19 repeats of TG which span from the position 74053321 to the position 74053358. Thus, according to HGVS's 3' rule, the correct representation would be chr7 74053357 TG -, not 74053321 nor 74053323 (OpenCRAVAT's mapper follows these repeats and outputs correct cDNA and protein changes).

  • The issue here is that there is not yet any standard which all variant callers and annotation databases agree to use. If the HGVS 3' rule was considered as the standard, strelka and octopus would be both wrong. If ClinVar would adopt the 3' rule, both strelka and octopus variants wouldn't be found in ClinVar.

  • Maybe internally applying normalization on input variants and annotation databases is a solution and this can be the value that such a tool as OpenCRAVAT can contribute, although there can be the issue of the confusion coming from the difference between an annotation source's original data and its OpenCRAVAT-processed version. Also, it would be quite a large work.

What do you think?

rkimoakbioinformatics avatar Apr 20 '21 04:04 rkimoakbioinformatics

What do you think?

https://www.youtube.com/watch?v=dsx2vdn7gpY

cariaso avatar Apr 20 '21 05:04 cariaso

more seriously, or perhaps just more productively, I really don't know. I've been vaguely aware of these sorts of issues for a quite a while, but until you've got a concrete case it's hard to make progress. I think this is useful in that way. No tool, nor any database alone can solve this. And any industry wide fix is likely to be a very slow slog. It's quite disheartening, but I expect that I can now use this thread to raise related bug reports in the https://github.com/Illumina/strelka https://github.com/luntergroup/octopus repos, and perhaps raise it to clinvar.

I suppose it also speaks to the need for an independent tool to do the 3' shifting. Otherwise there will be a lot of independent, and slightly different solutions to the problem.

Even with that, there is a timing issue very akin to https://en.wikipedia.org/wiki/Dagen_H which would be impossible in practice

https://github.com/Illumina/strelka/issues/193 https://github.com/luntergroup/octopus/issues/172

cariaso avatar Apr 20 '21 05:04 cariaso

Ok. Meanwhile, OpenCRAVAT team will discuss the possibility of applying a unified process on all resources for consistency. If this strelka, octopus, and ClinVar case is solved within OpenCRAVAT, I'll post an update.

rkimoakbioinformatics avatar Apr 20 '21 05:04 rkimoakbioinformatics

I'll also mention that while HGVS seems to dictate 3', the VCF spec seems to necessitate a 5' 'upstream' base before indels (it seems left aligned might be a more correct term). For this reason, when I've adhoced partial solutions to this, In the past I've done 5' shifting. Perhaps what you're imagining is a post-VCF step and it doesn't matter. But if there was a generic tool for shifting VCFs, it would likely take a VCF as input and spit one out. If so, having both 5' and 3' requirements might end up quite unnecessarily large, and perhaps raise pathologic issues with the difference between the individual vs the chosen reference.

cariaso avatar Apr 20 '21 06:04 cariaso

https://genome.sph.umich.edu/wiki/Variant_Normalization argues for 5' normalization, and indicates that this tool can do the job https://github.com/atks/vt https://genome.sph.umich.edu/wiki/Vt#Normalization

https://annovar.openbioinformatics.org/en/latest/articles/VCF/

There is no community consensus yet on how to describe an indel in an unique way. Many users prefer to do a left-normalization. Left-normalization means that the start position of a variant should be shifted to the left utill it is no longer possible to do so, so that the smaller the number, the better. However, HGVS clearly specifies that left-normalization would be performed on cDNA (mRNA) coordinate, which means that right-normalization is required for half the genes in human genome. We will just have to accept the fact that people do not agree with each other at this point.

Currently, the following databases in ANNOVAR are left-normalized so that users can directly use them to compare to your left-normalized variant file: ... clinvar_20150330 ....

cariaso avatar Apr 20 '21 08:04 cariaso

more notes that seem relevant

https://samtools.github.io/bcftools/bcftools.html#norm

https://github.com/Janchorizo/ban-vcf

https://github.com/freebayes/freebayes suggests others that might be relevant

decompose the output with tools in vcflib, vt, vcf-tools, bcftools, GATK, or Picard.

cariaso avatar Apr 20 '21 08:04 cariaso

I'm seeing a broad consensus that left-aligning is the most common for vcfs (confirmed for octopus but see comments on why this isn't ideal at https://github.com/luntergroup/octopus/issues/172#issuecomment-823138437 ) and I suspect that is what both strelka and clinvar do.

cariaso avatar Apr 20 '21 09:04 cariaso

https://pypi.org/project/bioutils/ bioutils.normalize – allele normalization (left shuffle, right shuffle, expanded, vcf)

expanded is new to me, and used by https://github.com/ga4gh/vrs-python & https://github.com/ga4gh/vrs the idea being that any ambiguous positioning will be expanded to cover the entire range of ambiguous positions so that comparisons are easier.

ga4gh:VA.rUTGEnc_US6EQSAh64vrXMyDVxq_TWL2
NC_000007.14:g.74053331_74053332del

becomes

ga4gh:VA.cTEuSY_2j4R7UFLTj_Rh10JjVDayk_vr
NC_000007.14:g.74053321_74053359delinsTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT

and

ga4gh:VA.GaUmNc2yxPnonnHIqiWOG1iiPJJKSzB0
NC_000007.14:g.74053321_74053322del

becomes

ga4gh:VA.cTEuSY_2j4R7UFLTj_Rh10JjVDayk_vr
NC_000007.14:g.74053321_74053359delinsTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT

note the difference between NC_000007.14:g.74053331_74053332del NC_000007.14:g.74053321_74053322del ending up with the same expanded form

cariaso avatar Apr 23 '21 16:04 cariaso