guacamole
guacamole copied to clipboard
cmdline help page for a caller should give a description of the algorithm
When we show help for a caller, we should include a description of the algorithm used instead of just the arguments it takes. We should:
- add a hook for Guacamole commands to specify help descriptions
- write descriptions for our callers
Example:
$ scripts/guacamole uniformbayes -h
Using most recently modified jar: target/guacamole-0.0.1.jar
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
--> [Thu Oct 30 20:11:04 EDT 2014]: Guacamole starting.
-chr VAL : Chromosome to filter to
-debug : If set, prints a higher level of debug output.
-debug-genotype-filters : Print count of genotypes after each filtering step
-emit-ref : Output homozygous reference calls.
-exclude-indel : Exclude indel variants in comparison
-exclude-snv : Exclude SNV variants in comparison
-filterMultiAllelic : Filter any pileups > 2 bases considered
-h (-help, --help, -?) : Print help
-loci VAL : Loci at which to call variants. Either 'all' or contig:start-end,contig:start-end,.
..
-max-genotypes X : Maximum number of genotypes to output. 0 (default) means output all genotypes.
-maxMappingComplexity N : Maximum percent of reads that can be mapped with low quality (indicative of a
complex region
-maxPercentAbnormalInsertSize N : Filter pileups where % of reads with abnormal insert size is greater than
specified (default: 100)
-maxReadDepth N : Maximum number of reads for a genotype call
-minAlignmentForComplexity N : Minimum read mapping quality for a read (Phred-scaled) that counts towards poorly
mapped for complexity (default: 1)
-minAlternateReadDepth N : Minimum number of reads with alternate allele for a genotype call
-minEdgeDistance N : Filter reads where the base in the pileup is closer than minEdgeDistance to the
(directional) end of the read
-minLikelihood N : Minimum Phred-scaled likelihood. Default: 0 (off)
-minMapQ N : Minimum read mapping quality for a read (Phred-scaled). (default: 1)
-minReadDepth N : Minimum number of reads for a genotype call
-no-sequence-dictionary : If set, get contigs and lengths directly from reads instead of from sequence
dictionary.
-out VARIANTS_OUT : Variant output path. If not specified, print to screen.
-out-chunks X : When writing out to json format, number of chunks to coalesce the genotypes RDD
into.
-parallelism N : Num variant calling tasks. Set to 0 (default) to use the number of Spark
partitions.
-parquet_block_size N : Parquet block size (default = 128mb)
-parquet_compression_codec [UNCOMPRESSED | SNAPPY | GZIP | LZO] : Parquet compression codec
-parquet_disable_dictionary : Disable dictionary encoding
-parquet_logging_level VAL : Parquet logging level (default = severe)
-parquet_page_size N : Parquet page size (default = 1mb)
-partition-accuracy N : Num micro partitions to use per task in loci partitioning. Set to 0 to partition
loci uniformly. Default: 250.
-print_metrics : Print metrics to the log on completion
-reads X : Aligned reads
-truth truth : The truth ADAM or VCF genotypes file
0.43 real 0.67 user 0.06 sys
related: https://github.com/hammerlab/guacamole/issues/106
another nit: if you give an incorrect argument, the error is reported on stdout, not stderr