Exomiser icon indicating copy to clipboard operation
Exomiser copied to clipboard

Enable output-prefix on cli

Open julesjacobsen opened this issue 3 years ago • 0 comments

Issue

Given I run the same sample using two separate analyses e.g. genome and exome presets

 --analysis genome.yml --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz
 --analysis exome.yml --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz

Exomiser will currently overwrite the results of the first with those of the second:

results/
├── Pfeiffer_exomiser.html
└── Pfeiffer_exomiser.json

This can be remedied by defining two job or output-option files:

# output-options-exomiser.yml
---
outputPrefix: results/pfeiffer-exomiser

and

# output-options-genomiser.yml
---
outputPrefix: results/pfeiffer-genomiser
 --analysis genome.yml --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output output-options-genomiser.yml
 --analysis exome.yml --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output output-options-exomiser.yml

would return the results:

results/
├── pfeiffer-genomiser.html
├── pfeiffer-genomiser.json
├── pfeiffer-exomiser.html
└── pfeiffer-exomiser.json

This is a better outcome but not necessarily the easiest for the user as they need to create a new file with which to specify the output options.

Solution

The simplest would be to add a new --output-prefix option which will replace the default:

  --preset genome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/genomiser/Pfeiffer
  --preset exome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/exomiser/Pfeiffer

which would produce output in two new directories:

  results/genomiser/Pfeiffer.html
  results/genomiser/Pfeiffer.json
  results/exomiser/Pfeiffer.html
  results/exomiser/Pfeiffer.json

or

  --preset genome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/Pfeiffer-genomiser
  --preset exome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/Pfeiffer-exomiser

which would produce output in the results directory:

  results/Pfeiffer-genomiser.html
  results/Pfeiffer-genomiser.json
  results/Pfeiffer-exomiser.html
  results/Pfeiffer-exomiser.json

Both --output and --output-prefix can be specified together like so:

# project-specific-output.yml
---
outputContributingVariantsOnly: true
numGenes: 10
minExomiserGeneScore: 0.7
#outputPrefix: results/exomiser-output
#out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: [HTML, JSON])
outputFormats: [ HTML, JSON, TSV_GENE ]
  --preset genome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/genomiser/Pfeiffer --output project-specific-output.yml
  --preset exome --sample examples/pfeiffer-phenopacket.yml --vcf examples/Pfeiffer.vcf.gz --output-prefix results/exomiser/Pfeiffer --output project-specific-output.yml

Here the --output-prefix would override anything specified in the project-specific-output.yml file.

@damiansm @pnrobinson Does anyone have a strong feeling about being able to change other output options, besides the outputPrefix field? These could be specified on the CLI like the output-prefix to override any defaults. I don't think there would be any great need for this as its probably only the outputPrefix which is the sort of thing which will need to be changed for each analysis.

These can all be specified before-hand in a job.yml using a Python string Template, but sometimes a cli option is more convenient.

julesjacobsen avatar Jan 14 '22 14:01 julesjacobsen