biobambam2 Any Option for bamsort in Parallel?

For "bamsort" command, I cannot find an option for using multiple threads in sorting process (I don't think the inputthreads and outputthreads are for this kind of purpose). And when I look at the source code, I can see an option "KEY" as "sortthreads" which doesn't show up in the --help information. I wonder if I can use it or is there any reason that you hide it. Maybe it is not ready to be used because it can cause some error?

Jan 21 '18 13:01 spikeliu

It works quite nicely. See https://github.com/ablab/spades/issues/67#issuecomment-359267438 for 3 shell lines and an example performance on LustreFS. Best is to have a ramdisk on the machine, write the sorted BAM file to it and then, move the final BAM with its index to the real storage filesystem.

Jan 21 '18 18:01 mmokrejs

The sortthreads option should work. I will add it to the documention in the next version.

Jan 24 '18 08:01 gt1

So how does sortthreads relate to inputthreads and outputthreads if I use all three on the commandline? In which ratio should I distribute the available cores in between these three?

Jan 24 '18 09:01 mmokrejs

@mmokrejs: the issue with bamsort is that it does not use pooled threading throughout the program. The input, output and sortthreads may run all at the same time. You can use a tool like cpuset to limit the number of real cores used by the program and set all three to that number of threads. If you want a program that will use exactly a given number of threads for processing at any time, then please check bamsormadup, it was designed for this.

Jan 24 '18 10:01 gt1

@gt1 You say that if I run bamsort sortthreads=$phys_cores inputthreads=$phys_cores outputthreads=$phys_cores that I will end-up with load 300?

Shall I divide the numbers of available cores by 3 to ensure the load will be max 100?

But isn't one decompression and one compression thread enough? So bamsort sortthreads=$phys_cores-2 inputthreads=1 outputthreads=1 ?

Jan 24 '18 10:01 mmokrejs

@mmokrejs This could happen, although it is rather unlikely. In my experience, assuming you do not set level=0 for uncompressed output, the output compression is rather compute heavy, so you might want to spend more threads there.

Jan 24 '18 11:01 gt1

biobambam2 biobambam2 copied to clipboard

Any Option for bamsort in Parallel?

biobambam2
biobambam2 copied to clipboard