guacamole cmdline help page for a caller should give a description of the algorithm

When we show help for a caller, we should include a description of the algorithm used instead of just the arguments it takes. We should:

add a hook for Guacamole commands to specify help descriptions
write descriptions for our callers

Example:

$ scripts/guacamole uniformbayes -h
Using most recently modified jar: target/guacamole-0.0.1.jar
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
--> [Thu Oct 30 20:11:04 EDT 2014]: Guacamole starting.
 -chr VAL                                                        : Chromosome to filter to
 -debug                                                          : If set, prints a higher level of debug output.
 -debug-genotype-filters                                         : Print count of genotypes after each filtering step
 -emit-ref                                                       : Output homozygous reference calls.
 -exclude-indel                                                  : Exclude indel variants in comparison
 -exclude-snv                                                    : Exclude SNV variants in comparison
 -filterMultiAllelic                                             : Filter any pileups > 2 bases considered
 -h (-help, --help, -?)                                          : Print help
 -loci VAL                                                       : Loci at which to call variants. Either 'all' or contig:start-end,contig:start-end,.
                                                                   ..
 -max-genotypes X                                                : Maximum number of genotypes to output. 0 (default) means output all genotypes.
 -maxMappingComplexity N                                         : Maximum percent of reads that can be mapped with low quality (indicative of a
                                                                   complex region
 -maxPercentAbnormalInsertSize N                                 : Filter pileups where % of reads with abnormal insert size is greater than
                                                                   specified (default: 100)
 -maxReadDepth N                                                 : Maximum number of reads for a genotype call
 -minAlignmentForComplexity N                                    : Minimum read mapping quality for a read (Phred-scaled) that counts towards poorly
                                                                   mapped for complexity (default: 1)
 -minAlternateReadDepth N                                        : Minimum number of reads with alternate allele for a genotype call
 -minEdgeDistance N                                              : Filter reads where the base in the pileup is closer than minEdgeDistance to the
                                                                   (directional) end of the read
 -minLikelihood N                                                : Minimum Phred-scaled likelihood. Default: 0 (off)
 -minMapQ N                                                      : Minimum read mapping quality for a read (Phred-scaled). (default: 1)
 -minReadDepth N                                                 : Minimum number of reads for a genotype call
 -no-sequence-dictionary                                         : If set, get contigs and lengths directly from reads instead of from sequence
                                                                   dictionary.
 -out VARIANTS_OUT                                               : Variant output path. If not specified, print to screen.
 -out-chunks X                                                   : When writing out to json format, number of chunks to coalesce the genotypes RDD
                                                                   into.
 -parallelism N                                                  : Num variant calling tasks. Set to 0 (default) to use the number of Spark
                                                                   partitions.
 -parquet_block_size N                                           : Parquet block size (default = 128mb)
 -parquet_compression_codec [UNCOMPRESSED | SNAPPY | GZIP | LZO] : Parquet compression codec
 -parquet_disable_dictionary                                     : Disable dictionary encoding
 -parquet_logging_level VAL                                      : Parquet logging level (default = severe)
 -parquet_page_size N                                            : Parquet page size (default = 1mb)
 -partition-accuracy N                                           : Num micro partitions to use per task in loci partitioning. Set to 0 to partition
                                                                   loci uniformly. Default: 250.
 -print_metrics                                                  : Print metrics to the log on completion
 -reads X                                                        : Aligned reads
 -truth truth                                                    : The truth ADAM or VCF genotypes file
        0.43 real         0.67 user         0.06 sys

Oct 31 '14 00:10 timodonnell

related: https://github.com/hammerlab/guacamole/issues/106

Oct 31 '14 00:10 timodonnell

another nit: if you give an incorrect argument, the error is reported on stdout, not stderr

Nov 06 '14 22:11 timodonnell