Representation of uncertainty in JSONs
Currently uncertainty in a trait, e.g. location for node X, is represented in augur along the lines of:
/* traits.json */
{X: {location: "blue", location_confidence: {blue: 1.0}}}
/* v1 tree JSON */
{strain: "X", attr: {location: "blue", location_confidence: {blue: 1.0}}}
/* v2 JSON */
{name: "X", node_attrs: {location: {value: "blue", confidence: {blue: 1.0}}}}
Temporal confidence is slightly different formatting, but conceptually identical. This is independent of the model employed.
Importantly, if node X had location "blue" (via metadata) then the output is indistinguishable to if it was inferred with 100% confidence as being in location "blue".
For this example ☝️ all nodes would look like X above, and auspice wouldn't know whether to say "Node A: inferred as blue with 100% confidence" or "Node A: blue". This is even more problematic with tip sampling dates, where we have some code in auspice to try to guess the true meaning:
if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {
Proposed solution
Modify augur traits and augur refine to produce output where non-inferred nodes do not have associated confidences. This will then be carried through augur export {v1,v2}. Auspice's v1->v2 JSON conversion function implement the code above to remove confidence values for tips it believes aren't inferred.
The issue I see here is that time tree confidences are not always inferred (for performance reasons). But augur refine exports raw-date and that could be compared to the inferred date. Similarly, traits could write the input value into the json if it exists. I would prefer this to signal inference through absence of confidence values.
Following up with two related (I think... 5 years later) requests:
-
In the Auspice JSON an inferred
num_datelooks like"num_date": {"value": 2025.13, "confidence": [2025.027, 2025.13]}. We should add the underlying (metadata) date (in this example,2025-XX-XX) to the exported JSON. Key name suggestionsraw_value,raw? This would preserve whatever values we allow in augur (e.g. see #1304).- (This would require changes to Auspice. A short-term solution would be to add it as a separate attr.)
-
In parallel, but broader scope, having a
inferred: booleankey/value in the node attr would be immensely helpful. I think that's basically what the original issue here is talking about.
We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON
Sketching out what this may look like
Augur patch
diff --git a/augur/export_v2.py b/augur/export_v2.py
index 6484eca7..7e2245e4 100644
--- a/augur/export_v2.py
+++ b/augur/export_v2.py
@@ -859,6 +859,11 @@ def set_node_attrs_on_tree(data_json, node_attrs, additional_metadata_columns):
if is_valid(raw_data.get("num_date", None)): # it's ok not to have temporal information
node["node_attrs"]["num_date"] = {"value": format_number(raw_data["num_date"])}
node["node_attrs"]["num_date"].update(attr_confidence(node["name"], raw_data, "num_date"))
+ # We aim to know whether the date has been inferred via timetree. The following approach is
+ # temporary - ideally `augur refine` would add a `inferred: boolean` value.
+ original_value = raw_data.get("raw_date", "")
+ if original_value and not re.match(r"^\d{4}-\d{2}-\d{2}$", original_value):
+ node["node_attrs"]["num_date"]["raw_value"] = original_value
def _transfer_url_accession(node, raw_data):
for prop in ["url", "accession"]:
Auspice patch
diff --git a/src/components/tree/infoPanels/click.js b/src/components/tree/infoPanels/click.js
index 8a423499..bffe488c 100644
--- a/src/components/tree/infoPanels/click.js
+++ b/src/components/tree/infoPanels/click.js
@@ -177,12 +177,14 @@ const SampleDate = ({isTerminal, node, t}) => {
const date = getTraitFromNode(node, "num_date");
if (!date) return null;
+ const original = getTraitFromNode(node, "num_date", {raw: true});
const dateUncertainty = getTraitFromNode(node, "num_date", {confidence: true});
if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {
return (
<>
{item(t(isTerminal ? "Inferred collection date" : "Inferred date"), numericToCalendar(date))}
{item(t("Date Confidence Interval"), `(${numericToCalendar(dateUncertainty[0])}, ${numericToCalendar(dateUncertainty[1])})`)}
+ {original && item(t("Raw date"), original)}
</>
);
}
diff --git a/src/util/treeMiscHelpers.js b/src/util/treeMiscHelpers.js
index ef71a66c..6960bd8c 100644
--- a/src/util/treeMiscHelpers.js
+++ b/src/util/treeMiscHelpers.js
@@ -25,10 +25,10 @@ james hadfield, nov 2019.
* NOTE: do not use this for "div", "vaccine" or other traits set on `node_attrs`
* which don't share the same structure as traits. See the JSON spec for more details.
*/
-export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}={}) => {
+export const getTraitFromNode = (node, trait, {entropy=false, confidence=false, raw=false}={}) => {
if (!node.node_attrs) return undefined;
- if (!entropy && !confidence) {
+ if (!entropy && !confidence && !raw) {
if (!node.node_attrs[trait]) {
if (trait === strainSymbol) return node.name;
return undefined;
@@ -42,6 +42,9 @@ export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}=
} else if (confidence) {
if (node.node_attrs[trait]) return node.node_attrs[trait].confidence;
return undefined;
+ } else if (raw) {
+ if (node.node_attrs[trait]) return node.node_attrs[trait].raw_value;
+ return undefined;
}
return undefined;
};
The underlying raw date has been added by the 2 PRs linked above: https://github.com/nextstrain/augur/pull/1760 and https://github.com/nextstrain/auspice/pull/1943
Is there anything else to do with regards to expressing temporal uncertainty in JSONs?
Is there anything else to do with regards to expressing temporal uncertainty in JSONs?
Temporal uncertainty is now done, but we still need to implement the ~same approach for augur traits in Augur and allow Auspice to read those properties for any metadata key.
@jameshadfield This might be better for a future dev chat instead of GitHub comments, but what do you think about using that encoding of uncertainty in the confidence field to represent uncertainty of points with error bars in the scatterplot view?
I'm thinking of @trvrb's recent inclusion of MLR fitness per strain in the ncov builds where there is uncertainty from the MLR models that could be passed through to the visualization.
[...] what do you think about using that encoding of uncertainty in the confidence field to represent uncertainty of points with error bars in the scatterplot view?
Seems very reasonable for continuous traits. Is using the (existing JSON property) confidence: [lower, upper] enough, or would you want additional ones?
Lower/upper would cover most cases. It could be useful to have a way to represent "samples" from a distribution to be plotted instead of a range of values, in the same way Zoltar allows users to register forecasts as samples. Those Zoltar docs include other common representations of distributions that could maybe be handy, but I'm not sure how we'd implement them in Auspice. Even the "samples" approach could complicate the standard idea that each point in a scatterplot should represent a single sequence in the tree.