TreeToReads
TreeToReads copied to clipboard
Parallelization
I was wondering if you are interested in parallelizing? Maybe there are some python packages that could help. I am simulating the genomes of a 1700-taxon tree and it's just taking a very long time, but it wouldn't be so bad if I could simulate one genome per processor. I tried an xargs statement for the ART step, and I'm not sure if it would be helpful or not to you.
\ls *.fasta | xargs -P 12 -n 1 bash -c '
b=$(basename $0 .fasta);
dir="tmp/$b";
prefix="$dir/$b";
mkdir -p $dir;
art_illumina -1 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R1.txt -2 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R2.txt -na -sam -p -i $0 -l 150 -f 40 -m 380 -s 10 -o $prefix && \
gzip -v $dir/*.fq && \
samtools view -bS -o $prefix.bam $prefix.sam && \
samtools sort $prefix.bam $prefix.sorted.bam && \
rm -v $prefix.bam $prefix.sam
'
It's a good idea! It is mostly the art step that is slow, and that would be really straightforward to parallelize.