kmertools
kmertools copied to clipboard
kmer based feature extraction tool for bioinformatics, metagenomics, AI/ML and more
kmertools: DNA Vectorisation Tool
$$\ $$\ $$$$$$$$\ $$\ $$ | $$ | \__$$ __| $$ | $$ |$$ / $$$$$$\$$$$\ $$$$$$\ $$$$$$\ $$ | $$$$$$\ $$$$$$\ $$ | $$$$$$$\ $$$$$ / $$ _$$ _$$\ $$ __$$\ $$ __$$\ $$ | $$ __$$\ $$ __$$\ $$ |$$ _____| $$ $$
Overview
kmertools is a k-mer based feature extraction tool designed to support metagenomics and other bioinformatics analytics. This tool leverages k-mer analysis to vectorize DNA sequences, facilitating the use of these vectors in various AI/ML applications.
NEW: kmertools is now available on bioconda at https://anaconda.org/bioconda/kmertools.
Features
- Oligonucleotide Frequency Vectors: Generate frequency vectors for oligonucleotides.
- Minimiser Binning: Efficiently bin sequences using minimisers to reduce data complexity.
- Chaos Game Representation (CGR): Compute CGR vectors for DNA sequences based on k-mers or whole sequence transformation.
- Coverage Histograms: Create coverage histograms to analyze the depth of sequencing reads.
Installation
Option 1: from bioconda (recommended)
You can install kmertools from Bioconda at https://anaconda.org/bioconda/kmertools. Make sure you have conda installed.
# create conda environment and install kmertools
conda create -n kmertools -c bioconda kmertools
# activate environment
conda activate kmertools
Option 2: from sources
You can install kmertools directly from the source by cloning the repository and using Rust's package manager cargo.
git clone https://github.com/your-repository/kmertools.git
cd kmertools
cargo build --release
Now add the binary to path (you may modify ~/.bashrc or ~/.zshrc)
# to add to current terminal
export PATH=$PATH:$(pwd)/target/release/
# to save to ~/.bashrc
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.bashrc
source ~/.bashrc
# to save to ~/.zshrc for Mac
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.zshrc
source ~/.zshrc
Test the installation
After setting up, run the following command to print out the kmertools help message.
kmertools --help
Help
Please read our comprehensive Wiki.
Authors
- Anuradha Wickramarachchi https://anuradhawick.com
- Vijini Mallawaarachchi https://vijinimallawaarachchi.com
Citation
If you use kmertools please cite as follows.
@software{Wickramarachchi_kmertools_DNA_Vectorisation,
author = {Wickramarachchi, Anuradha and Mallawaarachchi, Vijini},
title = {{kmertools: DNA Vectorisation Tool}},
url = {https://github.com/anuradhawick/kmertools},
version = {0.1.0}
}
Please refer to the Wiki for citations of relevant algorithms.
Support and contributions
Please get in touch via author websites or GitHub issues. Thanks!