samstat
samstat copied to clipboard
SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.
[[https://github.com/TimoLassmann/samstat/actions/workflows/cmake.yml][https://github.com/TimoLassmann/samstat/actions/workflows/cmake.yml/badge.svg]]
- SAMStat
SAMStat is an efficient C program to quickly display statistics of large sequence files from next generation sequencing projects. When applied to SAM/BAM files all statistics are reported for unmapped, poorly and accurately mapped reads separately. This allows for identification of a variety of problems, such as remaining linker and adaptor sequences, causing poor mapping. Apart from this SAMStat can be used to verify individual processing steps in large analysis pipelines.
SAMStat reports length distribution, base quality distribution, mapping statistics, mismatch, insertion and deletion error profiles. The output is a single html page:
[[Image of example output][https://user-images.githubusercontent.com/8110320/175869206-6edcb06d-1afc-42f6-bbb8-16a2a18146f0.png]]
- How to install
SAMstat depends on the hdf5 library. To install on linux:
Ubuntu/Debian:
#+begin_src bash :eval never sudo apt-get install -y libhdf5-dev #+end_src
On a mac via [[https://brew.sh][brew]]:
#+begin_src bash :eval never brew install hdf5 #+end_src
To build SAMstat:
#+begin_src bash :eval never git clone https://github.com/TimoLassmann/samstat.git cd samstat mkdir build cd build cmake .. make make install #+end_src
- Usage
#+begin_src bash :eval never
samstat <file.sam> <file.bam> <file.fa> <file.fq> ...
For each input file SAMStat will create a single html page named after the input file name plus a dot =samstat.html= suffix.
Available options:
#+begin_src bash :eval never
-d/-dir : Output directory. []
NOTE: by default SAMStat will place reports in the same directory as the input files.
-p/-peek : Report stats only on the first
-h/-help : Prints help message. [] -v/-version : Prints version information. [] #+end_src
- Please cite: Timo Lassmann (2023) "SAMStat 2: quality control for next generation sequencing data." Bioinformatics. (2023): btad019, https://doi.org/10.1093/bioinformatics/btad019
Lassmann et al. (2010) "SAMStat: monitoring biases in next generation sequencing data." Bioinformatics 27.1 (2011): 130-131. [[https://doi.org/10.1093%2Fbioinformatics%2Fbtq614][doi:10.1093/bioinformatics/btq614]]