Daniel Cameron
Daniel Cameron
> I'd prefer three new numbers (all of which are distinct from the ploidy): > > LOCAL-A. One element for each allele present in LA minus one (to exclude the...
> Adopting Hail's "LA" name seems like the ideal approach. Yes, given that it's now 0-based and we need to explicitly include REF, `LA` is a more appropriate name than...
> Checkpointing or an END-aware index is key for analysis but not for interchange. The reason for defining INFO END as I did was to avoid breaking VCF indexing. VCF...
> Maybe I'm misreading but I think any of the three options for encoding local allele indexed fields facilitates lossless merging. A VCF with custom R/A/G FORMAT fields cannot be...
> Would we change the GVCF section as well? So we'd have INFO LEN, FORMAT LEN, and (no longer preferred? deprecated?) INFO END? FORMAT LEN and retain INFO END as...
REF: TAAAAAAAT Variant 1: A>T @ position 3 Variant 2: A>AA @ position 5 How should Variant 2 be normalised? The bounds would be pos1 but Variant 1 is in...
If the variants are cis, the normalisation algorithm results in loss of information as it can be reconstructed as; TATAAAAAAT (correct) or TAATAAAAAT (wrong)
If the variants are unphased the two haplotypes are: `TAAAAAAAT`/`TATAAAAAAT` (ref/both) or `TATAAAAAT`/`TAAAAAAAAT` (first/second) > create the full ref and full alt In this context, what do you mean by...
My point is that the normalisation procedure doesn't state whether the variant should be extended with the ref or the alt. If it's the ref then normalisation loses information about...
@Daniel-Liu-c0deb0t any specific blockers/code review issues with this PR, or it is just purely time availability at this point? Thanks.