biostar94573 and multiple sequence alignments
Maybe the MAFFT output doesn't give the proper format for your tool to run, but I am not getting correct looking results. Can you look at what MAFFT outputs here: http://mafft.cbrc.jp/alignment/server/spool/_out151218093135893D4OpNAX8jGoYH7Tx2bF0C.html
It looks similar to your clustal sample output but without the conservation notation at the end of each segment. I even tried their fasta format with the hyphens for gaps but it gave the same looking output.
I don't think there is a problem: some of your sequences have a very large deletion.
>4:98103819
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-------------------------------------agctttgaagagagcagtggttc
tcccaggacgcagctggagatctgagaacggg----cagactgcctcctcaagtgggtcc
ctgactcctgacccccgagcagcctaactgggaggca-cccccagcaggggcaca-----
--ctgacacctcacacggcagggtattccaacagacctgcagctgagggtcctgtctgtt
the program tries to compile the indel at the same position, the more there are some large indels, the more you'll have a large deletion in the VCF.
Hi Pierre,
thanks for the quick reply. Maybe I don't understand how it interprets what should be an entry in the vcf. Since all I provided was a multi sequence alignment it probably does not know what REF is? I was hoping it would call variants for anything that was not conserved 100%. My output of that clustal file only has variants from position 2139 to 3836. Just looking quickly there should be tons of deletions called from the beginning?
my syntax was java -jar biostar94573.jar mafft.aln
Thanks!
the program scans from 5' to 3' and , for the deletions, search for the '-'. For one deletion and as long as you're going to have some '-' at the same position, the program will extend the size of the current variation. Your file have a deletion at almost each position: that is why you get only a few variants...
an idea: '-' are interpreted as a deletion. try to replace the leading and trailing '-' with spaces.