seasonal-flu
seasonal-flu copied to clipboard
Use metadata from NA segment in joined metadata when HA segment isn't available
Current Behavior
Our current approach to joining segment-level metadata records into isolate-level metadata records is an HA-centric one such that NA records without a matching HA do not get any metadata from the NA record in the isolate-level record.
Expected behavior
When HA records are missing, we still want to know as much as possible about the NA record including the isolate id, the collection date, etc. We will use this information in segment-level analyses such as the flu_frequencies workflow where we estimate NA-specific clade frequencies and want to use all available NA records.
Possible solution
One solution could be to update the join_metadata script to define all segment-specific columns (e.g., "passage_category" should be segment-specific) and then update the isolate-level metadata with the first set of remaining isolate-level columns that are presenting in a segment's record (e.g., date, region, country, etc.).