rowbowt copied to clipboard
run-length BWT tools for genomic sequences
rowbowt: run-length BWT tools for working with genomic sequences
Author: Taher Mun
Code heavily derives from Nicola Prezza's r-index repository.
Requires a c++17 compliant compiler and a Unix-based operating system.
To download and build:
git clone;
mkdir rowbowt/build;
cd rowbowt/build;
cmake ..
This produces the exectuables rb_align
and rb_build
Building a rowbowt index
generate files needed for building the index (exact format specifications will be added below):
(required) - contains the BWT of the original text. This will let you find the number of occurences of your pattern in the original text -
(optional) - contains the run-starts and run-ends SA samples, respecitively. Use this if you want use the index to find pattern locations in the original text. -
files - contains the markers (ref position + allele) associated with each SA element. Use this if you want to find what alleles wrt the reference sequence your pattern overlaps. -
we recommend using
(included in this repo) to generate these required files from a reference sequence and a VCF containing a haplotype reference panel. See the README forpfbwt-f
for details on how to run.
cd pfbwt-f make vcf_to_bwt python3 [-s] [-m] -o <output prefix> <reference fasta> <VCF>
Run the build command
./rb_build [-s] [-m] -o <output prefix> <rowbowt index prefix>
Querying sequences against a rowbowt index
./rb_align [-s] [-m] <rowbowt index prefix> <fastq>
to query locations and markers, respectively. -
Proper formatting (ie. sam format for locations, VCF format for markers) will be added in the future.