augur icon indicating copy to clipboard operation
augur copied to clipboard

filter: Prefer "output sequences" over "output"?

Open victorlin opened this issue 1 year ago • 2 comments

augur filter allows --output, --output-sequences, and -o to be used interchangeably:

https://github.com/nextstrain/augur/blob/da1c89d4b3232aa977b69fb8df33d666532c9a56/augur/filter/init.py#L105

The order here means that it must be internally referenced as args.output, where output is the default value of dest.

"output" is ambiguous since this is just one of many output options. I would prefer the more specific name to align with other options and subcommands.

Two layers to this proposal:

  1. Prefer "output sequences" over "output" internally.

    • Use dest='output_sequences' and args.output_sequences.
  2. Prefer "output sequences" over "output" for users.

    • Reorder the options to '--output-sequences', '--output', '-o' so that the preferred name is displayed first. This would remove the need for an explicit dest.
    • A bigger change would be deprecating the --output/-o flags and removing in a major release, but maybe that's not necessary and would just be extra churn.

victorlin avatar Aug 27 '24 23:08 victorlin

Thanks for documenting this so clearly, @victorlin. I'm definitely in favor of preferring --output-sequences for users and eventually deprecating --output.

huddlej avatar Aug 29 '24 18:08 huddlej

It might be worth considering doing the same in augur index. Current usage:

usage: augur index [-h] --sequences SEQUENCES --output OUTPUT [--verbose]

Count occurrence of bases in a set of sequences.

options:
  -h, --help            show this help message and exit
  --sequences SEQUENCES, -s SEQUENCES
                        sequences in FASTA or VCF formats. Augur will summarize the content of FASTA sequences and only report the names of strains found in a given VCF. (default: None)
  --output OUTPUT, -o OUTPUT
                        tab-delimited file containing the number of bases per sequence in the given file. Output columns include strain, length, and counts for A, C, G, T, N, other valid IUPAC characters, ambiguous characters ('?' and '-'), and other invalid
                        characters. (default: None)
  --verbose, -v         print index statistics to stdout (default: False)

There, --output-sequences needs to be added first.

victorlin avatar Sep 25 '24 23:09 victorlin