harvest
harvest copied to clipboard
Using parsnp with a draft reference
Hi,
I'd like to use parsnp with the genbank option on, but my reference is a draft genome. Is this a problem for parsnp or can it handle the situation? I've had a look in the manual, tutorial and this forum, and couldn't find the answer. Sorry if this is here somewhere...
I have about 43 scaffolds in my reference by the way.
Many thanks for your help!
Mathieu
As long as the identifiers in the Genbank VERSION lines appear in the fasta tags of their corresponding contigs, it should be handled fine I believe.
Should the fasta headers that contain the genbank version lines be formatted in a specific way? is there a specific position where they should be located in the header?
It searches, so it can be anywhere in the tag. And without the "VERSION" itself, just what follows it (normally an accession).
Hi,
I've tried to run parsnp on my data. I want to align my 225 genomes to my reference assembly, but it seems that parsnp uses only one genome from my Genomes directory. I paste the command line below, as well as the output of parsnp. Could you help me understanding what is the issue here?
Many thanks! Best, Mathieu
~/src/Harvest-Linux64-v1.1.2/parsnp -p 8 -c -g ref_draft.gbk -d ScaffoldsDir/ -o ResultsParsnp/
|--Parsnp v1.2--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
SETTINGS: |-refgenome: ref_draft.gbk.fna |-aligner: libMUSCLE |-seqdir: ScaffoldsDir/ |-outdir: ResultsParsnp/ |-OS: Linux |-threads: 8
<<Parsnp started>>
-->Reading Genome (asm, fasta) files from ScaffoldsDir/.. |->[OK] -->Reading Genbank file(s) for reference (.gbk) ref_draft.gbk.. |->[OK] -->Running Parsnp multi-MUM search and libMUSCLE aligner.. |->[WARNING]: aligned regions cover less than 10% of reference genome! please verify recruited genomes are all strain of interest |->[OK] -->Running PhiPack on LCBs to detect recombination.. |->[SKIP] -->Reconstructing core genome phylogeny.. |->[OK] -->Creating Gingr input file.. |->[OK] -->Calculating wall clock time.. |->Aligned 2 genomes in 49.20 seconds
<<Parsnp finished! All output available in ResultsParsnp/>>
Validating output directory contents... 1)parsnp.tree: newick format tree [OK] 2)parsnp.ggr: harvest input file for gingr (GUI) [OK] 3)parsnp.xmfa: XMFA formatted multi-alignment [OK]