vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

Small non-zero values for size scale are not visible

Open jakevdp opened this issue 6 years ago • 14 comments

In Vega-Lite 2.x, if you encode a quantity with size and that quantity has elements with value zero, those points do not appear on the plot with the default scale:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
  },
  "encoding": {
    "color": {
      "field": "weather",
      "type": "nominal"
    },
    "size": {
      "field": "precipitation",
      "type": "quantitative"
    },
    "x": {
      "field": "temp_min",
      "type": "quantitative"
    },
    "y": {
      "field": "temp_max",
      "type": "quantitative"
    }
  },
  "mark": "point"
}

vega

In Vega-Lite 1.x, this was not the case; here's the result of the above spec with VL1:

vega 2

In VL2, you can fix this by setting the size scale domain to start at a negative number:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
  },
  "encoding": {
    "color": {
      "field": "weather",
      "type": "nominal"
    },
    "size": {
      "field": "precipitation",
      "type": "quantitative",
      "scale": {"domain": [-1, 50]}
    },
    "x": {
      "field": "temp_min",
      "type": "quantitative"
    },
    "y": {
      "field": "temp_max",
      "type": "quantitative"
    }
  },
  "mark": "point"
}

vega 1

I think that silently hiding valid data should be considered a bug, and that the domain should default to something like what is in the final panel.

jakevdp avatar Mar 08 '18 07:03 jakevdp

I also consider this a bug. We should map the range to [0..max] but 0 should map to a circle that doesn't have an empty area inside.

domoritz avatar Mar 08 '18 07:03 domoritz

Instead of setting the domain, we should set the range.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "width": 500,
  "height": 300,
  "data": {
    "url": "data/seattle-weather.csv"
  },
  "encoding": {
    "color": {
      "field": "weather",
      "type": "nominal"
    },
    "size": {
      "field": "precipitation",
      "type": "quantitative",
      "scale": {
        "range": [1,200]
      }
    },
    "x": {
      "field": "temp_min",
      "type": "quantitative"
    },
    "y": {
      "field": "temp_max",
      "type": "quantitative"
    }
  },
  "mark": "point"
}

image

domoritz avatar Mar 08 '18 09:03 domoritz

I wonder whether this is actually a Vega bug. With size 1, points are visible and even have a hole, while with size 0, they completely disappear. The documentation says that size sets the area of a point. I think an area of 0 should still have stroke, though.

editor

cc @jheer

domoritz avatar Mar 08 '18 09:03 domoritz

Hmm, if we set the size of points to less than 1, we get thinner strokes.

screen shot 2018-04-21 at 23 20 43

I wonder whether there is a way to get filled circles of radius 1 for the smallest circles instead of hollow circles.

@kanitw @jheer any ideas?

domoritz avatar Apr 22 '18 06:04 domoritz

I wonder whether there is a way to get filled circles of radius 1 for the smallest circles instead of hollow circles.

Yes, you can add production rule, instead of making scale inaccurate.

kanitw avatar Apr 22 '18 15:04 kanitw

Can you show an example in Vega? I can implement it in Vega-Lite afterwards.

domoritz avatar Apr 22 '18 15:04 domoritz

Just test if the size field is too small (need some epsilon constant), then apply different size, fill, stroke instead.

kanitw avatar Apr 22 '18 15:04 kanitw

Idea: You can even set the color to be different color for non-positive number, if color/fill/stroke is not set. (This is a bit more controversial so I'm not 100% sure if we should do this.)

kanitw avatar Apr 22 '18 15:04 kanitw

Something like

{
  "$schema": "https://vega.github.io/schema/vega/v3.0.json",
  "autosize": "pad",
  "padding": 5,
  "width": 500,
  "height": 500,
  "style": "cell",
  "data": [
    {
      "name": "source_0",
      "url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv",
      "format": {
        "type": "csv",
        "parse": {
          "precipitation": "number",
          "temp_min": "number",
          "temp_max": "number"
        },
        "delimiter": ","
      },
      "transform": [
        {
          "type": "filter",
          "expr": "datum[\"precipitation\"] !== null && !isNaN(datum[\"precipitation\"]) && datum[\"temp_min\"] !== null && !isNaN(datum[\"temp_min\"]) && datum[\"temp_max\"] !== null && !isNaN(datum[\"temp_max\"])"
        }
      ]
    }
  ],
  "marks": [
    {
      "name": "marks",
      "type": "symbol",
      "style": ["point"],
      "from": {"data": "source_0"},
      "encode": {
        "update": {
          "opacity": {"value": 0.7},
          "fill": {
            "signal": "scale('size', datum.precipitation) < 1 ? scale('color', datum.weather) : 'transparent'"
          },
          "stroke": {"scale": "color", "field": "weather"},
          "x": {"scale": "x", "field": "temp_min"},
          "y": {"scale": "y", "field": "temp_max"},
          "size": {
            "signal": "scale('size', datum.precipitation) < 1 ? 1 : scale('size', datum.precipitation)"
          }
        }
      }
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": {"data": "source_0", "field": "temp_min"},
      "range": [0, {"signal": "width"}],
      "nice": true,
      "zero": true
    },
    {
      "name": "y",
      "type": "linear",
      "domain": {"data": "source_0", "field": "temp_max"},
      "range": [{"signal": "height"}, 0],
      "nice": true,
      "zero": true
    },
    {
      "name": "color",
      "type": "ordinal",
      "domain": {"data": "source_0", "field": "weather", "sort": true},
      "range": "category"
    },
    {
      "name": "size",
      "type": "linear",
      "domain": {"data": "source_0", "field": "precipitation"},
      "range": [0.1, 361],
      "nice": false,
      "zero": true
    }
  ],
  "axes": [
    {
      "scale": "x",
      "orient": "bottom",
      "title": "temp_min",
      "labelFlush": true,
      "labelOverlap": true,
      "tickCount": {"signal": "ceil(width/40)"},
      "zindex": 1
    },
    {
      "scale": "x",
      "orient": "bottom",
      "grid": true,
      "tickCount": {"signal": "ceil(width/40)"},
      "gridScale": "y",
      "domain": false,
      "labels": false,
      "maxExtent": 0,
      "minExtent": 0,
      "ticks": false,
      "zindex": 0
    },
    {
      "scale": "y",
      "orient": "left",
      "title": "temp_max",
      "labelOverlap": true,
      "tickCount": {"signal": "ceil(height/40)"},
      "zindex": 1
    },
    {
      "scale": "y",
      "orient": "left",
      "grid": true,
      "tickCount": {"signal": "ceil(height/40)"},
      "gridScale": "x",
      "domain": false,
      "labels": false,
      "maxExtent": 0,
      "minExtent": 0,
      "ticks": false,
      "zindex": 0
    }
  ],
  "legends": [
    {
      "stroke": "color",
      "title": "weather",
      "encode": {"symbols": {"update": {"opacity": {"value": 0.7}}}}
    },
    {
      "size": "size",
      "title": "precipitation",
      "encode": {"symbols": {"update": {"opacity": {"value": 0.7}}}}
    }
  ],
  "config": {"axisY": {"minExtent": 30}}
}

image

domoritz avatar Apr 22 '18 16:04 domoritz

@jakevdp

I think the workaround to set domain to [-1, 50] is easy but can be a bit inaccurate, since the size ratio won't be preserved. (Data point with value 50 won't have exactly 2x the size of another point with value 25.)

I think it is actually better to set both domain and range with minimum value and clamp the scale

"scale": {"domain": [0.125, 50], "range":[1, 400], "clamp": true, "zero": false}

By setting range to be domain multiply by 8, then we still preserve the ratio, while having minimum size = 1.

Note that

  1. you shouldn't have to set zero to false, I'm fixing that in https://github.com/vega/vega-lite/pull/3691
  2. Setting domain [-1, 50] is not that bad either since people can't distinguish size very well anyway, but I just feel it's better just write specification that's more accurate.

Full spec:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "data": {
    "url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
  },
  "encoding": {
    "color": {
      "field": "weather",
      "type": "nominal"
    },
    "size": {
      "field": "precipitation",
      "type": "quantitative",
      "scale": {"domain": [0.125, 50], "range":[1, 400], "clamp": true}
    },
    "x": {
      "field": "temp_min",
      "type": "quantitative"
    },
    "y": {
      "field": "temp_max",
      "type": "quantitative"
    }
  },
  "mark": "point"
}

kanitw avatar May 03 '18 06:05 kanitw

There can be a similar issue for bars with very small non-zero value -- we should consider if we want to apply the same trick https://github.com/vega/vega-lite/issues/255.

kanitw avatar May 09 '18 18:05 kanitw

~~Another potentially better solution is to see if we can augment Vega scales to have a bump scale.~~

Actually bump scale like https://github.com/vega/vega-lite/issues/255 could be slightly different in the sense that users may want to keep zero 0px but have min size for non-zero length.

kanitw avatar May 24 '18 06:05 kanitw

Actually thinking more, it's probably more maintainable if we just modify the default scale range to include small non zero values. (It'll be slightly less precise than condition encoding, but it will add way less complexity to the system.)

The relevant code is here. https://github.com/vega/vega-lite/blob/next/src/compile/scale/range.ts#L453

The question is what should be the property names for these min size for zero?

Note that we already have config.scale.minBandSize/minStrokeWidth/minFontSize/minSize to represent min range for size scale when zero is false.

So we're adding its counter part to handle min range when zero is true.

Design Decision 1: What's the name for the new config?

a) config.scale.minBandSizeWithZero b) config.scale.minBandSizeForDomainWithZero c) config.scale.zeroBandSize d) config.scale.bandSizeForZero

(Same pattern for strokeWidth/fontSize/size)

I like c) for brevity.

Design Decision 2: Do we rename existing config for when zero = false

a) do 1a and rename config.scale.minBandSize to config.scale.minBandSizeWithoutZero + make backward compatible migration (remove old prop from the type, but add a step to migrate) b) 1b and rename config.scale.minBandSize to config.scale.minBandSizeForDomainWithoutZero + make backward compatible migration c) do 1c and rename config.scale.minBandSize to config.scale.minNonZeroBandSize. d) Keep existing name -- (Don't introduce breaking change)

I think if we wanna rename c) is probably the best, but honestly it's not worth the breaking change, so probably d).

Design Decision 3: Determine min size for bandSize/strokeWidth/fontSize/size

  • I'll defer to @yhoonkim to determine.

cc: @domoritz @yhoonkim Any preference for decision 1-2?

kanitw avatar Aug 09 '22 02:08 kanitw

So, first off it's important to know that size encoding is actually not perceptually linear. This means that people cannot compare ratios of sizes. Therefore, I think it's totally okay to have symbol size start at some value.

How about we do 1b) and the new config (minBandSizeForDomainWithZero) overrides the existing config (minBandSize) for the case where the domain includes zero? Then we don't break backwards compatibility. If you prefer 1c), I am okay with that but would still not introduce a breaking change to the existing property.

domoritz avatar Aug 19 '22 22:08 domoritz