ncov icon indicating copy to clipboard operation
ncov copied to clipboard

Use ordinal scale for S1 mutations

Open jameshadfield opened this issue 3 years ago • 3 comments

Traits with integer values are better displayed by auspice using an ordinal scale. This will result in a legend with integer entries rather than our current situation of floats.

image Left: this PR, right: current nextstrain.org

jameshadfield avatar Apr 29 '21 07:04 jameshadfield

What happens when viruses with 20 s1_mutations exist? Does the ordinal legend end up as 0, 1, ..., 19, 20 (with 21 entries)? I originally didn't have this an ordinal because I was assuming that ordinal would imply situations where s1_mutations entries include just 0, 1, 2, 7, 8 would make color ramp that has the same gradation going from 1 to 2 as from 2 to 7 as in an ordinal scale only rank matters.

I'd think that S1 mutations might be better covered as a continuous variable, but with defined intervals as sketched out in https://github.com/nextstrain/auspice/pull/1340, but open to your advice James.

trvrb avatar Apr 29 '21 17:04 trvrb

This change reminds me of the opposite change that just happened in the seasonal-flu repo for epitope mutations. :)

@trvrb's argument about ordinal values technically only considering rank instead of intervals is convincing (Vega uses this technical definition of ordinal). Although, I can imagine that we don't want to have to manually define bounds for each continuous whole-numbered variable. It would be nice if Auspice's automated segmentation of the data range used integer bounds when the inputs are integers. This latter feature is similar to both the "nice" and "bins" options of Vega's quantitative scales.

On the flip side, a nice feature of using the ordinal values is that one can hover over a specific value in the legend and see the corresponding tips in the tree that match that exact value. With the continuous values, I can't tell as easily what values the highlighted tips have because they are in some range handled opaquely by the legend.

(Sorry this is a bit non-committal as a review, since I can see benefits to both approaches.)

huddlej avatar Apr 29 '21 22:04 huddlej

I think integer legend values are definitely preferable. We have been using ordinal to mean integer and to me the most obvious solution would be to have auspice figure out integer bins that don't exceed 12. So if there the range is 0 to 16, we would have bins of size 2...

rneher avatar May 02 '21 15:05 rneher

we are at about 40 S1 mutations now, way past the point where integers are useful.

rneher avatar Apr 07 '23 13:04 rneher