biobambam2 icon indicating copy to clipboard operation
biobambam2 copied to clipboard

Improving documentation and commandline argument names/behavior

Open mmokrejs opened this issue 7 years ago • 0 comments

Hi, I am new to using this tool (2.0.69) but find it weird why commandline arguments are not systematically shared or why some apllications lack arguments supposedly common to your tools.

bamsormadup uses reference="$ref" while bamsort uses calmdnmreference="$ref"

bamsormadup uses threads="$threads" while bamsort uses inputthreads="$input_threads" outputthreads="$output_threads" . Could bamsort also accept just threads and figure out how to split their distribution on its own?

Both tools use SO=coordinate but it would be clearer if there was also SI={coordinate,queryname,hash} possible. If input BAM header says conflicting info, then just exit.

bamsormadup has no option to restrict memory usage while bamsort uses blockmb but in 1MB units. Weird. Couldn't it be more user friendly so that I could specify memory=22G or memory=22g?

bamsormadup lacks index=1 option while probably same is achieved through indexfilename="$prefix".bam.bai. bamsort understands index=1 and probably I do not have to specify indexfilename="$prefix".bam.bai (should be the default, isn't it?).

bamsormadup -h text is quiet whether I="$infile" O="$outfile" is accepted or not. At least I would hope so from commandline options of bamsort.

It is not clear to me why bamsormadup docs merely guide me to use an SSD drive or a ramdisk to store TMP files. Couldn't it be done in memory? I mean, I do not have an SSD drive so I will create a ramdisk with ext3 filesystem without journal most likely. I wonder whether in-memory store wouldn't be better right away without filesystem overhead.

While it seems picard MarkDuplicates is superior to samtools rmdup I wonder how bamsormadup is standing in the comparison. Certainly I was tempted to mis-use bamsormadup to sort and index in parallel my BAM files while not using it to mark duplicate reads. Could that be done? This basically stems from the fact bamsort asks me to specify input vs. output threads, why in a so complicated way?

I know, "patches are welcome". ;) Thank you anyway for your current efforts.

mmokrejs avatar Feb 20 '17 17:02 mmokrejs