kalign icon indicating copy to clipboard operation
kalign copied to clipboard

A fast multiple sequence alignment program.

C/C++ CI CodeQL

Kalign

Kalign is a fast multiple sequence alignment program for biological sequences.

Installation

Release Tarball

Download tarball from releases. Then:

tar -zxvf kalign-<version>.tar.gz
cd kalign-<version>
./autogen.sh
./configure
make
make check
make install

Homebrew

brew install brewsci/bio/kalign

Developer version

git clone https://github.com/TimoLassmann/kalign.git
cd kalign
./autogen.sh
./configure
make
make check
make install

on macOS, install brew then:

brew install libtool
brew install automake
git clone https://github.com/TimoLassmann/kalign.git
cd kalign
./autogen.sh
./configure
make
make check
make install

Usage

Usage: kalign  -i <seq file> -o <out aln>

Options:

   --format           : Output format. [Fasta]
   --reformat         : Reformat existing alignment. [NA]
   --version          : Print version and exit

Kalign expects the input to be a set of unaligned sequences in fasta format or aligned sequences in aligned fasta, MSF or clustal format. Kalign automatically detects whether the input sequences are protein, RNA or DNA.

Since version 3.2.0 kalign supports passing sequence in via stdin and support alignment of sequences from multiple files.

Examples

Passing sequences via stdin:

cat input.fa | kalign -f fasta > out.afa

Combining multiple input files:

kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa

Align sequences and output the alignment in MSF format:

kalign -i BB11001.tfa -f msf  -o out.msf

Align sequences and output the alignment in clustal format:

kalign -i BB11001.tfa -f clu -o out.clu

Re-align sequences in an existing alignment:

kalign -i BB11001.msf  -o out.afa

Reformat existing alignment:

kalign -i BB11001.msf -r afa -o out.afa

Benchmark results

Here are some benchmark results. The code to reproduce these figures can be found at here.

Balibase

Balibase_scores

Bralibase

Bralibase_scores

Homfam

Homfam_scores

Quantest2

Quantest2_scores

Please cite:

  1. Lassmann, Timo. Kalign 3: multiple sequence alignment of large data sets. Bioinformatics (2019). pdf

Other papers:

  1. Lassmann, Timo, Oliver Frings, and Erik LL Sonnhammer. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic acids research 37.3 (2008): 858-865. Pubmed
  2. Lassmann, Timo, and Erik LL Sonnhammer. Kalign: an accurate and fast multiple sequence alignment algorithm. BMC bioinformatics 6.1 (2005): 298. Pubmed