augur icon indicating copy to clipboard operation
augur copied to clipboard

Representation of uncertainty in JSONs

Open jameshadfield opened this issue 6 years ago • 8 comments

Currently uncertainty in a trait, e.g. location for node X, is represented in augur along the lines of:

/* traits.json */
{X: {location: "blue", location_confidence: {blue: 1.0}}}
/* v1 tree JSON */
{strain: "X", attr: {location: "blue", location_confidence: {blue: 1.0}}}
/* v2 JSON */
{name: "X", node_attrs: {location: {value: "blue", confidence: {blue: 1.0}}}}

Temporal confidence is slightly different formatting, but conceptually identical. This is independent of the model employed.

Importantly, if node X had location "blue" (via metadata) then the output is indistinguishable to if it was inferred with 100% confidence as being in location "blue".

image For this example ☝️ all nodes would look like X above, and auspice wouldn't know whether to say "Node A: inferred as blue with 100% confidence" or "Node A: blue". This is even more problematic with tip sampling dates, where we have some code in auspice to try to guess the true meaning:

if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {

Proposed solution

Modify augur traits and augur refine to produce output where non-inferred nodes do not have associated confidences. This will then be carried through augur export {v1,v2}. Auspice's v1->v2 JSON conversion function implement the code above to remove confidence values for tips it believes aren't inferred.

jameshadfield avatar Oct 25 '19 00:10 jameshadfield

The issue I see here is that time tree confidences are not always inferred (for performance reasons). But augur refine exports raw-date and that could be compared to the inferred date. Similarly, traits could write the input value into the json if it exists. I would prefer this to signal inference through absence of confidence values.

rneher avatar Oct 26 '19 10:10 rneher

Following up with two related (I think... 5 years later) requests:

  • In the Auspice JSON an inferred num_date looks like "num_date": {"value": 2025.13, "confidence": [2025.027, 2025.13]}. We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON. Key name suggestions raw_value, raw? This would preserve whatever values we allow in augur (e.g. see #1304).

    • (This would require changes to Auspice. A short-term solution would be to add it as a separate attr.)
  • In parallel, but broader scope, having a inferred: boolean key/value in the node attr would be immensely helpful. I think that's basically what the original issue here is talking about.

jameshadfield avatar Feb 17 '25 00:02 jameshadfield

We should add the underlying (metadata) date (in this example, 2025-XX-XX) to the exported JSON

Sketching out what this may look like

Augur patch
diff --git a/augur/export_v2.py b/augur/export_v2.py
index 6484eca7..7e2245e4 100644
--- a/augur/export_v2.py
+++ b/augur/export_v2.py
@@ -859,6 +859,11 @@ def set_node_attrs_on_tree(data_json, node_attrs, additional_metadata_columns):
       if is_valid(raw_data.get("num_date", None)): # it's ok not to have temporal information
           node["node_attrs"]["num_date"] = {"value": format_number(raw_data["num_date"])}
           node["node_attrs"]["num_date"].update(attr_confidence(node["name"], raw_data, "num_date"))
+            # We aim to know whether the date has been inferred via timetree. The following approach is
+            # temporary - ideally `augur refine` would add a `inferred: boolean` value.
+            original_value = raw_data.get("raw_date", "")
+            if original_value and not re.match(r"^\d{4}-\d{2}-\d{2}$", original_value):
+                node["node_attrs"]["num_date"]["raw_value"] = original_value

   def _transfer_url_accession(node, raw_data):
       for prop in ["url", "accession"]:

Auspice patch
diff --git a/src/components/tree/infoPanels/click.js b/src/components/tree/infoPanels/click.js
index 8a423499..bffe488c 100644
--- a/src/components/tree/infoPanels/click.js
+++ b/src/components/tree/infoPanels/click.js
@@ -177,12 +177,14 @@ const SampleDate = ({isTerminal, node, t}) => {
 const date = getTraitFromNode(node, "num_date");
 if (!date) return null;

+  const original = getTraitFromNode(node, "num_date", {raw: true});
 const dateUncertainty = getTraitFromNode(node, "num_date", {confidence: true});
 if (date && dateUncertainty && dateUncertainty[0] !== dateUncertainty[1]) {
   return (
     <>
       {item(t(isTerminal ? "Inferred collection date" : "Inferred date"), numericToCalendar(date))}
       {item(t("Date Confidence Interval"), `(${numericToCalendar(dateUncertainty[0])}, ${numericToCalendar(dateUncertainty[1])})`)}
+        {original && item(t("Raw date"), original)}
     </>
   );
 }
diff --git a/src/util/treeMiscHelpers.js b/src/util/treeMiscHelpers.js
index ef71a66c..6960bd8c 100644
--- a/src/util/treeMiscHelpers.js
+++ b/src/util/treeMiscHelpers.js
@@ -25,10 +25,10 @@ james hadfield, nov 2019.
* NOTE: do not use this for "div", "vaccine" or other traits set on `node_attrs`
* which don't share the same structure as traits. See the JSON spec for more details.
*/
-export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}={}) => {
+export const getTraitFromNode = (node, trait, {entropy=false, confidence=false, raw=false}={}) => {
 if (!node.node_attrs) return undefined;

-  if (!entropy && !confidence) {
+  if (!entropy && !confidence && !raw) {
   if (!node.node_attrs[trait]) {
     if (trait === strainSymbol) return node.name;
     return undefined;
@@ -42,6 +42,9 @@ export const getTraitFromNode = (node, trait, {entropy=false, confidence=false}=
 } else if (confidence) {
   if (node.node_attrs[trait]) return node.node_attrs[trait].confidence;
   return undefined;
+  } else if (raw) {
+    if (node.node_attrs[trait]) return node.node_attrs[trait].raw_value;
+    return undefined;
 }
 return undefined;
};

Image

jameshadfield avatar Feb 17 '25 01:02 jameshadfield

The underlying raw date has been added by the 2 PRs linked above: https://github.com/nextstrain/augur/pull/1760 and https://github.com/nextstrain/auspice/pull/1943

Is there anything else to do with regards to expressing temporal uncertainty in JSONs?

victorlin avatar Mar 18 '25 23:03 victorlin

Is there anything else to do with regards to expressing temporal uncertainty in JSONs?

Temporal uncertainty is now done, but we still need to implement the ~same approach for augur traits in Augur and allow Auspice to read those properties for any metadata key.

jameshadfield avatar Mar 18 '25 23:03 jameshadfield

@jameshadfield This might be better for a future dev chat instead of GitHub comments, but what do you think about using that encoding of uncertainty in the confidence field to represent uncertainty of points with error bars in the scatterplot view?

I'm thinking of @trvrb's recent inclusion of MLR fitness per strain in the ncov builds where there is uncertainty from the MLR models that could be passed through to the visualization.

huddlej avatar Mar 19 '25 15:03 huddlej

[...] what do you think about using that encoding of uncertainty in the confidence field to represent uncertainty of points with error bars in the scatterplot view?

Seems very reasonable for continuous traits. Is using the (existing JSON property) confidence: [lower, upper] enough, or would you want additional ones?

jameshadfield avatar Mar 19 '25 16:03 jameshadfield

Lower/upper would cover most cases. It could be useful to have a way to represent "samples" from a distribution to be plotted instead of a range of values, in the same way Zoltar allows users to register forecasts as samples. Those Zoltar docs include other common representations of distributions that could maybe be handy, but I'm not sure how we'd implement them in Auspice. Even the "samples" approach could complicate the standard idea that each point in a scatterplot should represent a single sequence in the tree.

huddlej avatar Mar 19 '25 18:03 huddlej