gubbins
gubbins copied to clipboard
Output not just polymorphic sites:
Hiya,
I'm wondering if it's possible to output the whole filtered alignment (including invariant sites). I want to run BEAST after Gubbins, which has issues with only using polymorphic sites.
Cheers, Raphael
@EisenRa giving BEAST all the invariant sites will slow it down unnecessarily. i think the right approach is to include the constant site numbers in the BEAST XML file. This tool might be helpul: https://github.com/andersgs/beast2_constsites
@tseemann thanks for sending that through, I'll check it out!
@tseemann I've given beast2_consites a spin, and it works great!
However, in order to use it with Gubbins, I may need another approach. I noticed that the .gff file outputted by Gubbins contains start/end positions of the putatively recombinant sites. Does it seem reasonable to create a .bed file with these coordinates to feed into snippy-core?
if you wanted to mask the recombinant regions that gubbins (or clonalframeml) computer then yes passing them to snippy-core --mask FILE.BED
is the best solution. you should never mask recombination then use that as a reference, always do it at the end.
It is possible that paftools gff2bed
might work. Maybe i can support GFF too in Snippy core.
You may also be interested in the new snp-sites -C
option.
Thanks for the advice Torsten. For other people who may be interested, I ended up going with this approach:
- Run gubbins on genome of interest
- Use awk to pull out coordinate columns from the gubbins .gff file, use this as a .bed file
- Use snippy-core to create alignment
- Use b2consites with .xml created from SNIPPY alignment, reference genome, and .bed file to mask putatively recombinant sites
- Run BEAST2!
You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.
- run snippy on each isolate (or use snippy-multi)
- run snippy-clean_full_aln on core.full.aln
- give that to gubbins
- extract your regions to BED 4b. consider masking phage and plasmids too
- give them to snippy-core
- build a tree
I will ask @danielleingle to confirm this
Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949
Yes, whoops, you're correct!
4b. consider masking phage and plasmids too
I'm using PHASTER for this, which seems to be working well.
Yep, it's good if you can get it to work when you have > 10 contigs.
about masking phage using predictions from PHASTER, should I mask the intact phage only or mask all phage regions detected by PHASTER even they are not intact (e.g. regions only contain 20% of an intact phage, highlighted in red in PHASTER results), if I'm interested in the phylogeny.
You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.
1. run snippy on each isolate (or use snippy-multi) 2. run snippy-clean_full_aln on core.full.aln 3. give that to gubbins 4. extract your regions to BED 4b. consider masking phage and plasmids too 5. give them to snippy-core 6. build a tree
I will ask @danielleingle to confirm this
Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949
@tseemann did you ever hear back from @danielleingle? We (https://gitlab.com/bcorey and the MRSN) are polishing up our pipeline for variant analysis and would love some input on the best order of operations to produce an alignment for beast2_constsites -> BEAST2