usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

Increase memory and create a dynamic rule for seqtk_sample

Open natefoo opened this issue 6 years ago • 4 comments

From @bwlang on Gitter:

@natefoo : is it reasonable to increase the allowable RAM for seqtk sample to say 10G on usegalaxy.org? With fixed numbers of reads, it needs RAM proprotional to the targeted number of reads. I’m going to send a tool patch to enable 2-pass mode that is less memory intensive as well...

TODO:

  • [x] Increase static memory allocation to 16GB
  • [ ] Allocate memory dynamically based on input size

@bwlang could you comment with some details on the proportion needed?

natefoo avatar May 21 '19 15:05 natefoo

16GB allocated in 4deaf5539c03563d96142d8a1cec6c0548c6d3d8.

natefoo avatar May 21 '19 16:05 natefoo

I did a quick experiment...

for reads in 100 1000 10000 100000 1000000; do /usr/bin/time -v -a -o log seqtk sample 9.2.fastq.gz $reads  > /dev/null;  done

I fit that data (forcing intercept to 0)

Kbytes = 0.475991*Num Reads					

image

bwlang avatar May 21 '19 16:05 bwlang

note that this is not a function of input data size, it's a function of output number of reads specified. e.g sampling 100 reads from 1000000000000 reads still requires only about 2kb.

however sampling 100000 reads from 1000000000000 needs 475kb

bwlang avatar May 21 '19 16:05 bwlang

Perfect, thanks!

natefoo avatar May 21 '19 19:05 natefoo