BioSeq.jl
BioSeq.jl copied to clipboard
Julia's package for working on Bioinformatics with DNA, RNA and Protein Sequences
trafficstars
BioSeq.jl
Package for working with Nucleotides and Amino Acids on The Julia Language
Installation
Pkg.init() # Only the first time you install a Julia's Package
Pkg.add("BioSeq") # Install BioSeq.jl
using BioSeq # Starting to use BioSeq
Features
- 2-bit DNA sequence
DNA2Seqfor saving memory- Faster vectorized test for calculate percentage of GC, and test A C T G on
DNA2Seq
- Faster vectorized test for calculate percentage of GC, and test A C T G on
- 8-bit bitstype
NucleotideandAminoAcid- Vectors of this types can be used as DNA, RNA or Protein Sequences
- Some string's functions working for Sequences:
- Case conversions
- Matching functions (search, replace and others)
- IUPAC Regex is available for matching functions
- PROSITE patterns are available for matching functions
- Some string's functions working for Sequences:
- Alignments can be represented as Matrices of this types
- DArray of this types can be used for parallel computation
- Memory-mapped arrays of this types can be used for huge data
- Vectors of this types can be used as DNA, RNA or Protein Sequences
- 8-bit Bit-Level Coding Scheme for Nucleotides
- Translation methods and genetic codes
- Tools for using IntSet/Set/Dict as alphabets
- Common alphabets as IntSet, including extended IUPAC
- Dicts for generate complement for nucleotide sequences or change between 3 letter and 1 letter alphabets on Proteins
- Test for characters on alphabet
- Check for all characters on alphabet
- Swap for alphabet conversions
Documentation
Demo
julia> using BioSeq
julia> const dna4alphabet = alphabet(nt"ACTG", false)
Case Insensitive Alphabet{Nucleotide} of 4 elements:
indice : 256-element Uint8 Array
alphabet : 4-element Nucleotide Array
alphabet indice[alphabet]
Nucleotide (Int64) Uint8 (Int64)
A (65) 0x01 (1)
C (67) 0x02 (2)
T (84) 0x03 (3)
G (71) 0x04 (4)
julia> dnaseq = repeat( nt"GATTACA" , 2 )
14-element Nucleotide Array:
G
A
T
T
A
C
A
G
A
T
T
A
C
A
julia> check(dnaseq, dna4alphabet)
true
julia> protseq = translate(dnaseq,1)
4-element AminoAcid Array:
D
Y
R
L
julia> if ismatch( prosite"<D-x-[RM]" , protseq )
threeletters = swap(protseq, AMINO_1LETTER_TO_3 )
end
4-element ASCIIString Array:
"ASP"
"TYR"
"ARG"
"LEU"
Contributing
Fork and send a pull request or create a GitHub issue for bug reports or feature requests