biobambam2
biobambam2 copied to clipboard
Improving documentation and commandline argument names/behavior
Hi, I am new to using this tool (2.0.69) but find it weird why commandline arguments are not systematically shared or why some apllications lack arguments supposedly common to your tools.
bamsormadup
uses reference="$ref"
while bamsort
uses calmdnmreference="$ref"
bamsormadup
uses threads="$threads"
while bamsort
uses inputthreads="$input_threads" outputthreads="$output_threads"
. Could bamsort
also accept just threads
and figure out
how to split their distribution on its own?
Both tools use SO=coordinate
but it would be clearer if there was also SI={coordinate,queryname,hash}
possible. If input BAM header says conflicting info, then just exit.
bamsormadup
has no option to restrict memory usage while bamsort
uses blockmb
but in 1MB units. Weird. Couldn't it be more user friendly so that I could specify memory=22G
or memory=22g
?
bamsormadup
lacks index=1
option while probably same is achieved through indexfilename="$prefix".bam.bai
.
bamsort
understands index=1
and probably I do not have to specify indexfilename="$prefix".bam.bai
(should be the default, isn't it?).
bamsormadup -h
text is quiet whether I="$infile" O="$outfile"
is accepted or not. At least I would hope so from commandline options of bamsort
.
It is not clear to me why bamsormadup
docs merely guide me to use an SSD drive or a ramdisk to store TMP files. Couldn't it be done in memory? I mean, I do not have an SSD drive so I will create a ramdisk with ext3 filesystem without journal most likely. I wonder whether in-memory store wouldn't be better right away without filesystem overhead.
While it seems picard MarkDuplicates
is superior to samtools rmdup
I wonder how bamsormadup
is standing in the comparison. Certainly I was tempted to mis-use bamsormadup
to sort and index in parallel my BAM files while not using it to mark duplicate reads. Could that be done? This basically stems from the fact bamsort
asks me to specify input vs. output threads
, why in a so complicated way?
I know, "patches are welcome". ;) Thank you anyway for your current efforts.