gramtools
gramtools copied to clipboard
Build PRG from MSA and more sparsely from VCF
Targets here:
-
We want to be able to build a PRG from a MSA right now
-
Long alleles might need to be collapsed down. For eg a record of
TCAGA
(ref) andTTACA
(alt) will 'overlap' any other variation within the same region. This creates combinatorial explosion (vcf_clusterer
module) or straight out ignoring (perl script). One solution is building a graph from vcf and parsing that into a prg (this will collapse the SNPs in the long record); another (non-exclusive) is to allow nesting in gramtools, such that overlapping records are no longer flattened into one.
Now that gramtools supports prgs made with make_prg, we need a streamlined way to build a whole-genome graph from:
- a ref genome
- a set of MSAs (as a fofn for eg)
- a BED file (or equivalent) describing the MSA coordinates
From this gramtools (or make_prg?) runs make_prg on each MSA and combines the PRGs with the rest of the genome.
Hello folks,
I'm in the situation of needing exactly what @bricoletc describes above. Is there an approach that exists today which implements this functionality? Is this a serious plan with someone working on this functionality? If not, can I be of any assistance in making this a reality?
Best, Kevin
Hi @kdm9 , this is a timely question! The feature is not currently implemented in a simple way at all, I've done it via a snakemake worfklow. I will aim to implement this in gramtools. It should not be too complicated and is essential for tool usability. I estimate 2 weeks.
However, could you give me a sense of what you're trying to do? This would help make sure we're on same page and get a sense of your timeline. Feel free to drop email at [email protected] (also let me know if #163 works for you)