TreeToReads Parallelization

Parallelization

Open lskatz opened this issue 7 years ago • 1 comments

I was wondering if you are interested in parallelizing? Maybe there are some python packages that could help. I am simulating the genomes of a 1700-taxon tree and it's just taking a very long time, but it wouldn't be so bad if I could simulate one genome per processor. I tried an xargs statement for the ART step, and I'm not sure if it would be helpful or not to you.

\ls *.fasta | xargs -P 12 -n 1 bash -c '
  b=$(basename $0 .fasta); 
  dir="tmp/$b"; 
  prefix="$dir/$b"; 
  mkdir -p $dir; 
  art_illumina -1 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R1.txt -2 /scicomp/home/gzu2/bin/ART/Illumina_profiles/EmpMiSeq250R2.txt -na -sam -p -i $0 -l 150 -f 40 -m 380 -s 10 -o $prefix && \
  gzip -v $dir/*.fq && \
  samtools view -bS -o $prefix.bam $prefix.sam && \
  samtools sort $prefix.bam $prefix.sorted.bam && \
  rm -v $prefix.bam $prefix.sam
'

Nov 13 '17 18:11 lskatz

It's a good idea! It is mostly the art step that is slow, and that would be really straightforward to parallelize.

Nov 13 '17 19:11 snacktavish

TreeToReads TreeToReads copied to clipboard

Parallelization

TreeToReads
TreeToReads copied to clipboard