gubbins icon indicating copy to clipboard operation
gubbins copied to clipboard

Output not just polymorphic sites:

Open EisenRa opened this issue 5 years ago • 10 comments

Hiya,

I'm wondering if it's possible to output the whole filtered alignment (including invariant sites). I want to run BEAST after Gubbins, which has issues with only using polymorphic sites.

Cheers, Raphael

EisenRa avatar Jun 28 '19 02:06 EisenRa

@EisenRa giving BEAST all the invariant sites will slow it down unnecessarily. i think the right approach is to include the constant site numbers in the BEAST XML file. This tool might be helpul: https://github.com/andersgs/beast2_constsites

tseemann avatar Jun 28 '19 03:06 tseemann

@tseemann thanks for sending that through, I'll check it out!

EisenRa avatar Jul 01 '19 13:07 EisenRa

@tseemann I've given beast2_consites a spin, and it works great!

However, in order to use it with Gubbins, I may need another approach. I noticed that the .gff file outputted by Gubbins contains start/end positions of the putatively recombinant sites. Does it seem reasonable to create a .bed file with these coordinates to feed into snippy-core?

EisenRa avatar Oct 31 '19 03:10 EisenRa

if you wanted to mask the recombinant regions that gubbins (or clonalframeml) computer then yes passing them to snippy-core --mask FILE.BED is the best solution. you should never mask recombination then use that as a reference, always do it at the end.

It is possible that paftools gff2bed might work. Maybe i can support GFF too in Snippy core.

You may also be interested in the new snp-sites -C option.

tseemann avatar Oct 31 '19 20:10 tseemann

Thanks for the advice Torsten. For other people who may be interested, I ended up going with this approach:

  • Run gubbins on genome of interest
  • Use awk to pull out coordinate columns from the gubbins .gff file, use this as a .bed file
  • Use snippy-core to create alignment
  • Use b2consites with .xml created from SNIPPY alignment, reference genome, and .bed file to mask putatively recombinant sites
  • Run BEAST2!

EisenRa avatar Nov 07 '19 00:11 EisenRa

You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.

  1. run snippy on each isolate (or use snippy-multi)
  2. run snippy-clean_full_aln on core.full.aln
  3. give that to gubbins
  4. extract your regions to BED 4b. consider masking phage and plasmids too
  5. give them to snippy-core
  6. build a tree

I will ask @danielleingle to confirm this

Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949

tseemann avatar Nov 07 '19 01:11 tseemann

Yes, whoops, you're correct!

4b. consider masking phage and plasmids too

I'm using PHASTER for this, which seems to be working well.

EisenRa avatar Nov 07 '19 02:11 EisenRa

Yep, it's good if you can get it to work when you have > 10 contigs.

tseemann avatar Nov 07 '19 03:11 tseemann

about masking phage using predictions from PHASTER, should I mask the intact phage only or mask all phage regions detected by PHASTER even they are not intact (e.g. regions only contain 20% of an intact phage, highlighted in red in PHASTER results), if I'm interested in the phylogeny.

fengyuchengdu avatar May 28 '20 01:05 fengyuchengdu

You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.

1. run snippy on each isolate (or use snippy-multi)

2. run snippy-clean_full_aln on core.full.aln

3. give that to gubbins

4. extract your regions to BED
   4b. consider masking phage and plasmids too

5. give them to snippy-core

6. build a tree

I will ask @danielleingle to confirm this

Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949

@tseemann did you ever hear back from @danielleingle? We (https://gitlab.com/bcorey and the MRSN) are polishing up our pipeline for variant analysis and would love some input on the best order of operations to produce an alignment for beast2_constsites -> BEAST2

harlesscw avatar May 12 '21 16:05 harlesscw