ViennaRNA.jl
ViennaRNA.jl copied to clipboard
Julia interface to ViennaRNA for RNA structure prediction and analysis
Julia interface to the ViennaRNA library
Unofficial Julia interface to the ViennaRNA library for RNA secondary structure prediction and analysis. Please cite the original ViennaRNA publications if you use this library.
Installation
Install ViennaRNA from the Julia package REPL, which can be accessed
by pressing ]
from the Julia REPL:
add ViennaRNA
Usage
using ViennaRNA, Unitful
The Unitful
library is needed to be able to specify units with
@u_str
, e.g. 4u"kcal/mol"
or 37u"°C"
. You can get the degree
symbol °
by typing \degree
and pressing the TAB key in the REPL or
in an editor with Julia syntax support.
The original C API functions can be found in the submodule
ViennaRNA.LibRNA
. Most functions can be called with a String
containing the RNA sequence instead of a FoldCompound
,
e.g. mfe("GGGAAACCC")
.
FoldCompound
A FoldCompound
encapsulates nucleic acid strands and model details,
such as energy parameters, temperature, etc.
fc = FoldCompound("GGGGGAAAAACCCCCC";
options=[:mfe, :pf],
temperature=37u"°C",
uniq_ML=true,
circular=false)
Important keyword arguments
-
options
is a subset of[:default, :eval_only, :hybrid, :mfe, :pf, :window]
. -
temperature
is used to rescale the free energies with the formulaΔG = ΔH - TΔS
(the energy parameter sets contain enthalpy and entropy contributions). The default is37u"°C"
Model details (additional keyword arguments):
-
circular
: determines if the RNA strand is circular, i.e. the 5'-end and 3'-end are covalently bonded. Default isfalse
. -
dangles
: how to treat dangling base pairs in multiloops and the exterior loop. Can be 0, 1, 2, or 3. See ViennaRNA docs for details. Default is2
. -
gquadruplex
: allow G-quadruplexes in predictions. Default isfalse
. -
log_ML
: use logarithmic energy model for multiloops. Default isfalse
. -
max_bp_span
: maximum number of bases over which a basepair can span. Default value is-1
(which means unlimited). -
min_loop_length
: the minimum size of a loop (without the closing base pair). Default is3
. -
no_GU_basepairs
: disallow G-U basepairs. Default isfalse
. -
no_GU_closure
: disallow G-U basepairs as closing pairs for loops. Default isfalse
. -
no_lonely_pairs
: disallow isolated base pairs. Default isfalse
. -
special_hairpins
: use special hairpin energies for certain tri-, tetra- and hexloops. Default istrue
. -
uniq_ML
: use unique decomposition for multiloops, needed forsample_structures
andsubopt
. Default isfalse
. -
window_size
: window size to be used for local calculations performed in a window moving over the sequence. This value is ignored unless the:window
option is set in theFoldCompound
options
. The default value forwindow_size
is-1
.
Changing the energy parameter set
ViennaRNA stores energy parameters in global variables after loading
them from a file. Each time a new FoldCompound
is created, the
parameters are copied from the global variables and saved inside the
FoldCompound
.
The global variables storing energy parameters can be changed by
calling a function specific to each parameter set, or via a Symbol
with ViennaRNA.params_load(:RNA_Turner1999)
. Subsequent calls to
FoldCompound
will use the new parameters and store a copy of the
parameters in the newly created FoldCompound
.
The default energy set loaded on startup is :RNA_Turner2004
.
ViennaRNA.params_load_defaults() # default is :RNA_Turner2004
ViennaRNA.params_load_DNA_Mathews1999()
ViennaRNA.params_load_DNA_Mathews2004()
ViennaRNA.params_load_RNA_Andronescu2007()
ViennaRNA.params_load_RNA_Langdon2018()
ViennaRNA.params_load_RNA_Turner1999()
ViennaRNA.params_load_RNA_Turner2004()
ViennaRNA.params_load_RNA_misc_special_hairpins()
ViennaRNA.params_load(:DNA_Mathews2004)
# Options are
# :DNA_Mathews1999, :DNA_Mathews2004,
# :RNA_Andronescu2007, :RNA_Langdon2018,
# :RNA_Turner1999, :RNA_Turner2004
Multiple strands
Mutiple strands can be given by separating them with an &
, e.g.
FoldCompound("GGGG&CCCC")
.
Comparative folding with an MSA (alifold)
Pass multiple sequences to the FoldCompound constructor for comparative mode (alifold):
FoldCompound(["GG-GAAAACCCC", "GCCGAAA-CGGC"])
.
It is currently not possible to have multiple strands in alifold mode.
Minimum free energy structure (MFE)
# please excuse the excess precision printed when displaying -9.4 kcal/mol
mfe(fc) # => ("(((((.....))))).", -9.399999618530273 kcal mol^-1)
Partition function
partfn(fc) # => ("(((((.....})))),", -9.81672180213034 kcal mol^-1)
Free energy change of folding into a structure
energy(fc, "((((.......)))).") # => -6.199999809265137 kcal mol^-1
Basepair probabilities
bpp(fc) # => 16×16 Matrix{Float64}
Boltzmann probability of a structure
prob_of_structure(fc, "(((((.....))))).") # => 0.5085737925408758
Ensemble defect
ensemble_defect(fc, "(((((.....))))).") # => 0.33085374128228884
Sample structures (probabilistic / stochastic backtrack)
Sample from Boltzmann ensemble of secondary structures
sample_structures(fc) # => [ "((((......)))).." ]
sample_structures(fc; options=:nonredundant,
num_samples=20) # => 20-element Vector{String}
Suboptimal structures
All suboptimal structures with energies delta above the MFE structure
subopt(fc; delta=4u"kcal/mol") # => Vector{Tuple{String, Quantity}}
Suboptimal structures with the method of Zuker
subopt_zuker(fc) # => Vector{Tuple{String, Quantity}}
Sliding window prediction of MFE
mfe_window
saves all the results in a Vector
.
seq = "G"^50 * "A"^4 * "C"^50
mfe_window(seq; window_size=30)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
mfe_window(fc) # => Vector{ResultWindowMFE}
mfe_window_channel
returns a Channel
that can be used to
iteratively process the results.
seq = "G"^50 * "A"^4 * "C"^50
chan = mfe_window_channel(seq; window_size=30)
take!(chan)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
chan = mfe_window_channel(fc)
take!(chan)
Neighboring structures
Move set to reach neighboring structures of a given structure
neighbors(fc, Pairtable("((.....))")) # => Vector{Vector{Tuple{Int,Int}}}
Basepair distance between secondary structures
bp_distance("....", "(())") # => 2
Tree edit distance between secondary structures
tree_edit_dist("(..)", "....") # => 4.0f0
Mean basepair distance
Mean basepair distance of all structures to each other, weighted by the structure's Boltzmann probabilities
mean_bp_distance(fc) # => 5.266430215905888
Centroid structure
Centroid structure of ensemble: structure with smallest sum of base-pair distances weighted by Boltzmann probabilities:
centroid(fc) # => ("(((((.....))))).", 4.799131457924728)
Maximum expected accuracy (MEA) structure
The gamma parameter trades off specificity (low gamma) and sensitivity (high gamma).
mea(fc; gamma=1.0) # => ("(((((.....))))).", 10.706348f0)
Heat capacity calculation
# starting temperature, end temperature, temperature increment
heat_capacity(fc, 10u"°C", 60u"°C") # => Vector{Tuple{Quantity,Quantity}}
heat_capacity(fc, 10u"°C", 60u"°C", 1u"°C"; mpoints=5)
Plotting structures
# plot coordinates of a secondary structure, returns two arrays with
# x and y coordinates
plot_coords("(((...)))") # => Tuple{Float32[], Float32[]}
See PlotRNA.jl for more secondary structure plotting functionality.
Inverse folding / sequence design
inverse_fold("AAAAAAA", "((...))") a# => ("GCAAAGC", 2.0f0)
inverse_pf_fold("AAAAAAA", "((...))") # => ("GCCAAGC", 2.0244526863098145 kcal mol^-1)
Seeding the random number generator
ViennaRNA.init_rand_seed(42)
Modified bases energy parameter presets
Energy parameters for modified bases can be used via ViennaRNA's soft constraints mechanism.
using ViennaRNA
fc = FoldCompound("AAACCCUUU")
partfn(fc) # -0.0025467473022687203 kcal mol^-1
ViennaRNA.sc_mod_pseudouridine!(fc, [7,8,9]) # modify positions 7, 8, 9
partfn(fc) # -0.004713416050703315 kcal mol^-1
These functions are currently available:
sc_mod_7DA!
sc_mod_dihydrouridine!
sc_mod_inosine!
sc_mod_m6A!
sc_mod_pseudouridine!
sc_mod_purine!
Please refer to the ViennaRNA section on modified bases for more details.
Reducing memory usage
When creating many FoldCompound
s, running finalize
manually will
avoid excessive memory buildup.
for i = 1:100_000
fc = FoldCompound("ACGU")
# do something with fc
finalize(fc)
end