ncov icon indicating copy to clipboard operation
ncov copied to clipboard

Standardise parsing of missing / unknown metadata values

Open jameshadfield opened this issue 4 years ago • 0 comments

Context

Many scripts need to check if a value is "valid" before proceeding. These checks tend to be one-offs and therefore slightly different. This makes it hard to document / understand what to use for unknown / missing data. For instance, the construction of sequence recency uses conditionals:

if 'date_submitted' in d and d['date_submitted'] and d['date_submitted'] != "undefined":

whereas our diagnostic script uses a try/except approach for the same field:

try:
    return (datetime.strptime(x,"%Y-%m-%d") - timedelta(weeks=minus_weeks)).toordinal()
except:
    return np.nan

This results in different behavior if, for example, we had a date submission value of "?"

Potential solutions

The first step would be to document the checks we use in our various scripts (including augur). Ideally there would be a reusable function which each script can use.

jameshadfield avatar Dec 09 '21 21:12 jameshadfield