polyester
polyester copied to clipboard
Parallelize simulate_experiment()?
Hi,
Thanks for this useful package! I was wondering if there were any plans to parallelize read simulation?
I noticed that it might be possible to parallelize the outer for loop in sgreg()
. I tried replacing the for loop with foreach
from the DoMC package.
It was a quick change and although I didn't do extensive testing, it seems to speed things up significantly when more than one replicate or group is being simulated (see: kcha/polyester@7b6c31e60f6608f1b024d8f4be8833ce02d9e62f).
Interested in hearing your thoughts!
library(polyester)
library(doMC)
fold_changes = matrix(c(1, 1), nrow = 1)
for (c in c(1,4,8)) {
t <- system.time(
simulate_experiment('data/toy.fa',
readlen = 100,
reads_per_transcript = 10000,
fold_changes = fold_changes,
num_reps=c(4, 4),
outdir='simulated_reads/single',
distr="empirical",
error_model = "illumina5",
paired=FALSE,
gzip=TRUE, cores = c)
)
print(paste("Cores:", c))
print(t)
}
[1] "Cores: 1"
user system elapsed
27.032 0.974 28.075
[1] "Cores: 4"
user system elapsed
22.472 0.842 7.969
[1] "Cores: 8"
user system elapsed
49.123 2.340 7.094
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] doMC_1.3.4 iterators_1.0.8 foreach_1.4.3 polyester_1.7.1
loaded via a namespace (and not attached):
[1] compiler_3.2.3 zlibbioc_1.14.0 limma_3.24.15
[4] IRanges_2.2.9 tools_3.2.3 XVector_0.8.0
[7] logspline_2.1.9 Biostrings_2.36.4 codetools_0.2-14
[10] S4Vectors_0.6.6 BiocGenerics_0.14.0 stats4_3.2.3