ncov
ncov copied to clipboard
Improve / standardize the way that we pass through default parameters and build-specific parameters
Context
Currently, the workflow is a bit of mess when it comes to how one can specify parameters. There seems to be 3 different approaches:
1 – parameters.yaml only
For the frequencies
rule, the parameter pivot_interval
is retrieved via config["frequencies"]["pivot_interval"]
. This makes it grab exactly what's in parameters.yaml
under frequencies:pivot_interval
. If someone tries to specify a 4-week build-specific pivot_interval
via
frequencies:
global_all-time:
pivot_interval: 4
The workflow will secretly just keep using the pivot_interval
specified in paramaters.yaml
.
Most parameters in the workflow work like this.
2 – builds.yaml override with necessity of default
For the traits
rule, the parameter sampling_bias_correction
is retrieved via _get_sampling_bias_correction_for_wildcards
which is defined as
def _get_sampling_bias_correction_for_wildcards(wildcards):
if wildcards.build_name in config["traits"] and 'sampling_bias_correction' in config["traits"][wildcards.build_name]:
return config["traits"][wildcards.build_name]["sampling_bias_correction"]
else:
return config["traits"]["default"]["sampling_bias_correction"]
Ie it first looks for a build-specific list of traits:{build_name}:sampling_bias_correction
and if doesn't find it, it expects traits:default:sampling_bias_correction
in builds.yaml
or parameters.yaml
. This is described in the docs as
traits:
default:
sampling_bias_correction: 2.5
columns: ["country"]
washington:
# Override default sampling bias correction for
# "washington" build and continue to use default
# trait columns.
sampling_bias_correction: 5.0
This works, but requires parameters.yaml
to look like:
traits:
default:
sampling_bias_correction: 2.5
columns: ["country"]
with an extra default
key compared to other entries in parameters.yaml
.
This strategy is only used for the traits
rule.
3 – builds.yaml override without default
For the frequencies
rule, the parameter min_date
is retrieved via _get_min_date_for_frequencies
which is defined as
def _get_min_date_for_frequencies(wildcards):
if wildcards.build_name in config["frequencies"] and "min_date" in config["frequencies"][wildcards.build_name]:
return config["frequencies"][wildcards.build_name]["min_date"]
elif "frequencies" in config and "min_date" in config["frequencies"]:
return config["frequencies"]["min_date"]
else:
# If not explicitly specified, default to 1 year back from the present
min_date_cutoff = datetime.date.today() - datetime.timedelta(weeks=52)
return numeric_date(
min_date_cutoff
)
Ie it starts with trying to grab build-specific frequencies:{build_name}:min_date
from builds.yaml
. If this doesn't exist, it looks for frequencies:min_date
in builds.yaml
or parameters.yaml
and if this doesn't exist it directly returns a sensible default.
This strategy is only used for the frequencies
rule.
Description
I believe that we should replace the strategy 2 above used only in traits
rule with a setup like 3 above. This is what I did when I realized we weren't collecting narrow_bandwidth
properly. In PR https://github.com/nextstrain/ncov/pull/1130 I followed strategy 3 to specify narrow_bandwidth
as:
def _get_narrow_bandwidth_for_wildcards(wildcards):
# check if builds.yaml contains frequencies:{build_name}:narrow_bandwidth
if wildcards.build_name in config["frequencies"] and 'narrow_bandwidth' in config["frequencies"][wildcards.build_name]:
return config["frequencies"][wildcards.build_name]["narrow_bandwidth"]
# check if parameters.yaml contains frequencies:narrow_bandwidth
elif "frequencies" in config and "narrow_bandwidth" in config["frequencies"]:
return config["frequencies"]["narrow_bandwidth"]
# else return augur frequencies default value
else:
return 0.0833
We have the issue that if we swap _get_sampling_bias_correction_for_wildcards
to use strategy 3, we'll need to update parameters.yaml
to read
traits:
sampling_bias_correction: 2.5
columns: ["country"]
This will break custom profiles that external users are running. We could provide backwards compatibility however by looking first for traits:sampling_bias_correction
and then for traits:defaults:sampling_bias_correction
.
Does this seem reasonable?