Zamin Iqbal
Zamin Iqbal
Bah. Those files had some things that were non overlapping, but only if you think eg chr1 10 A G chr1 10 AG A I think those confused it. when...
OK, well , this is a bug I think. Current status, from ym point of view if there are no overlapping variants and onlt SNPs, then vcfcombine works very well,...
This is still an open bug afaik
When you say 'simply' - is it a long compute job?
VCF attached [perl_generated_vcf.txt](https://github.com/iqbal-lab-org/gramtools/files/3245304/perl_generated_vcf.txt) some of these records are horrible. eg see 523027, which starts like this NC_000962.3 523027 . GCAACACC ACAAAAA,ACAAAAC,ACAAAACA,ACAAAACC,ACAAAACG,ACAAAACT,ACAAAAG,ACAAAAT,ACAAACCA,ACAAACCC,ACAAACCG,ACAAACCT,ACAACAA,ACAACAC,ACAACACA,ACAACACC,ACAACACG,ACAACACT,ACAACAG,ACAACAT,ACAACCCA,ACAACCCC,ACAACCCG,ACAACCCT,ACAAGAA,ACAAGAC,ACAAGACA,ACAAGACC,ACAAGACG,ACAAGACT,ACAAGAG,ACAAGAT,ACAAGCCA,ACAAGCCC,ACAAGCCG,ACAAGCCT,ACAATAA,ACAATAC,ACAATACA,ACAATACC,ACAATACG,ACAATACT,ACAATAG,ACAATAT,ACAATCCA,ACAATCCC,ACAATCCG,ACAATCCT,ACAGAAA,ACAGAAAACA,ACAGAAAACC,ACAGAAAACG,ACAGAAAACT,ACAGAAACCA,ACAGAAACCC,ACAGAAACCG,ACAGAAACCT,ACAGAAC,ACAGAACA,ACAGAACACA,ACAGAACACC,ACAGAACACG,ACAGAACACT,ACAGAACC,ACAGAACCCA,ACAGAACCCC,ACAGAACCCG,ACAGAACCCT,ACAGAACG,ACAGAACT,ACAGAAG,ACAGAAGACA,ACAGAAGACC,ACAGAAGACG,ACAGAAGACT,ACAGAAGCCA,ACAGAAGCCC,ACAGAAGCCG,ACAGAAGCCT,ACAGAAT,ACAGAATACA,ACAGAATACC,ACAGAATACG,ACAGAATACT,ACAGAATCCA,ACAGAATCCC,ACAGAATCCG,ACAGAATCCT,ACAGACCA,ACAGACCC,ACAGACCG,ACAGACCT,ACAGAGAACA,ACAGAGAACC,ACAGAGAACG,ACAGAGAACT,ACAGAGACCA,ACAGAGACCC,ACAGAGACCG,ACAGAGACCT,ACAGAGCACA,ACAGAGCACC,ACAGAGCACG,ACAGAGCACT,ACAGAGCCCA,ACAGAGCCCC,ACAGAGCCCG,ACAGAGCCCT,ACAGAGGACA,ACAGAGGACC,ACAGAGGACG,ACAGAGGACT,ACAGAGGCCA,ACAGAGGCCC,ACAGAGGCCG,ACAGAGGCCT,ACAGAGTACA,ACAGAGTACC,ACAGAGTACG,ACAGAGTACT,ACAGAGTCCA,ACAGAGTCCC,ACAGAGTCCG,ACAGAGTCCT,ACAGCAA,ACAGCAC,ACAGCACA,ACAGCACC,ACAGCACG,ACAGCACT,ACAGCAG,ACAGCAT,ACAGCCCA,ACAGCCCC,ACAGCCCG,ACAGCCCT,ACAGGAA,ACAGGAC,ACAGGACA,ACAGGACC,ACAGGACG,ACAGGACT,ACAGGAG,ACAGGAT,ACAGGCCA,ACAGGCCC,ACAGGCCG,ACAGGCCT,ACAGTAA,ACAGTAC,ACAGTACA,ACAGTACC,ACAGTACG,ACAGTACT,ACAGTAG,ACAGTAT,ACAGTCCA,ACAGTCCC,ACAGTCCG,ACAGTCCT,AGAAAAA,AGAAAAC,AGAAAACA,AGAAAACC,AGAAAACG,AGAAAACT,AGAAAAG,AGAAAAT,AGAAACCA,AGAAACCC,AGAAACCG,AGAAACCT,AGAACAA,AGAACAC,AGAACACA,AGAACACC,AGAACACG,AGAACACT,AGAACAG,AGAACAT,AGAACCCA,AGAACCCC,AGAACCCG,AGAACCCT,AGAAGAA,AGAAGAC,AGAAGACA,AGAAGACC,AGAAGACG,AGAAGACT,AGAAGAG,AGAAGAT,AGAAGCCA,AGAAGCCC,AGAAGCCG,AGAAGCCT,AGAATAA,AGAATAC,AGAATACA,AGAATACC,AGAATACG,AGAATACT,AGAATAG,AGAATAT,AGAATCCA,AGAATCCC,AGAATCCG,AGAATCCT,AGAGAAA,AGAGAAAACA,AGAGAAAACC,AGAGAAAACG,AGAGAAAACT,AGAGAAACCA,AGAGAAACCC,AGAGAAACCG,AGAGAAACCT,AGAGAAC,AGAGAACA,AGAGAACACA,AGAGAACACC,AGAGAACACG,AGAGAACACT,AGAGAACC,AGAGAACCCA,AGAGAACCCC,AGAGAACCCG,AGAGAACCCT,AGAGAACG,AGAGAACT,AGAGAAG,AGAGAAGACA,AGAGAAGACC,AGAGAAGACG,AGAGAAGACT,AGAGAAGCCA,AGAGAAGCCC,AGAGAAGCCG,AGAGAAGCCT,AGAGAAT,AGAGAATACA,AGAGAATACC,AGAGAATACG,AGAGAATACT,AGAGAATCCA,AGAGAATCCC,AGAGAATCCG,AGAGAATCCT,AGAGACCA,AGAGACCC,AGAGACCG,AGAGACCT,AGAGAGAACA,AGAGAGAACC,AGAGAGAACG,AGAGAGAACT,AGAGAGACCA,AGAGAGACCC,AGAGAGACCG,AGAGAGACCT,AGAGAGCACA,AGAGAGCACC,AGAGAGCACG,AGAGAGCACT,AGAGAGCCCA,AGAGAGCCCC,AGAGAGCCCG,AGAGAGCCCT,AGAGAGGACA,AGAGAGGACC,AGAGAGGACG,AGAGAGGACT,AGAGAGGCCA,AGAGAGGCCC,A
**Proposal 1** 1. Divide the PRG into chunks of length 10000bp (or whatever). Say this is N chunks 2. For each chunk calculate the list of kmers contained within (not...
** Proposal 2** As proposal 1, except when comparing chunk i and chunk j, (assume diploid, but easy to modify what I say for other ploidies) sample 100 (or some...
Impact on *Plasmodium falciparum* (key use case for us): will immediately remove the crazy repeat regions where we should not waste time trying to quasimap, or variant call. Less wasted...
BTW, above I said something like `Impact on Plasmodium falciparum (key use case for us) will immediately remove the crazy repeat regions` I have since got a workaround for the...