harvest icon indicating copy to clipboard operation
harvest copied to clipboard

Using parsnp with a draft reference

Open mgroussi opened this issue 7 years ago • 4 comments

Hi,

I'd like to use parsnp with the genbank option on, but my reference is a draft genome. Is this a problem for parsnp or can it handle the situation? I've had a look in the manual, tutorial and this forum, and couldn't find the answer. Sorry if this is here somewhere...

I have about 43 scaffolds in my reference by the way.

Many thanks for your help!

Mathieu

mgroussi avatar Mar 08 '17 23:03 mgroussi

As long as the identifiers in the Genbank VERSION lines appear in the fasta tags of their corresponding contigs, it should be handled fine I believe.

ondovb avatar Mar 09 '17 19:03 ondovb

Should the fasta headers that contain the genbank version lines be formatted in a specific way? is there a specific position where they should be located in the header?

mgroussi avatar Mar 09 '17 21:03 mgroussi

It searches, so it can be anywhere in the tag. And without the "VERSION" itself, just what follows it (normally an accession).

ondovb avatar Mar 10 '17 02:03 ondovb

Hi,

I've tried to run parsnp on my data. I want to align my 225 genomes to my reference assembly, but it seems that parsnp uses only one genome from my Genomes directory. I paste the command line below, as well as the output of parsnp. Could you help me understanding what is the issue here?

Many thanks! Best, Mathieu

~/src/Harvest-Linux64-v1.1.2/parsnp -p 8 -c -g ref_draft.gbk -d ScaffoldsDir/ -o ResultsParsnp/

|--Parsnp v1.2--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest


SETTINGS: |-refgenome: ref_draft.gbk.fna |-aligner: libMUSCLE |-seqdir: ScaffoldsDir/ |-outdir: ResultsParsnp/ |-OS: Linux |-threads: 8


<<Parsnp started>>

-->Reading Genome (asm, fasta) files from ScaffoldsDir/.. |->[OK] -->Reading Genbank file(s) for reference (.gbk) ref_draft.gbk.. |->[OK] -->Running Parsnp multi-MUM search and libMUSCLE aligner.. |->[WARNING]: aligned regions cover less than 10% of reference genome! please verify recruited genomes are all strain of interest |->[OK] -->Running PhiPack on LCBs to detect recombination.. |->[SKIP] -->Reconstructing core genome phylogeny.. |->[OK] -->Creating Gingr input file.. |->[OK] -->Calculating wall clock time.. |->Aligned 2 genomes in 49.20 seconds

<<Parsnp finished! All output available in ResultsParsnp/>>

Validating output directory contents... 1)parsnp.tree: newick format tree [OK] 2)parsnp.ggr: harvest input file for gingr (GUI) [OK] 3)parsnp.xmfa: XMFA formatted multi-alignment [OK]

mgroussi avatar Mar 10 '17 05:03 mgroussi