vega-lite
vega-lite copied to clipboard
Small non-zero values for size scale are not visible
In Vega-Lite 2.x, if you encode a quantity with size and that quantity has elements with value zero, those points do not appear on the plot with the default scale:
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
},
"encoding": {
"color": {
"field": "weather",
"type": "nominal"
},
"size": {
"field": "precipitation",
"type": "quantitative"
},
"x": {
"field": "temp_min",
"type": "quantitative"
},
"y": {
"field": "temp_max",
"type": "quantitative"
}
},
"mark": "point"
}
In Vega-Lite 1.x, this was not the case; here's the result of the above spec with VL1:
In VL2, you can fix this by setting the size scale domain to start at a negative number:
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
},
"encoding": {
"color": {
"field": "weather",
"type": "nominal"
},
"size": {
"field": "precipitation",
"type": "quantitative",
"scale": {"domain": [-1, 50]}
},
"x": {
"field": "temp_min",
"type": "quantitative"
},
"y": {
"field": "temp_max",
"type": "quantitative"
}
},
"mark": "point"
}
I think that silently hiding valid data should be considered a bug, and that the domain should default to something like what is in the final panel.
I also consider this a bug. We should map the range to [0..max] but 0 should map to a circle that doesn't have an empty area inside.
Instead of setting the domain, we should set the range.
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"width": 500,
"height": 300,
"data": {
"url": "data/seattle-weather.csv"
},
"encoding": {
"color": {
"field": "weather",
"type": "nominal"
},
"size": {
"field": "precipitation",
"type": "quantitative",
"scale": {
"range": [1,200]
}
},
"x": {
"field": "temp_min",
"type": "quantitative"
},
"y": {
"field": "temp_max",
"type": "quantitative"
}
},
"mark": "point"
}
I wonder whether this is actually a Vega bug. With size 1, points are visible and even have a hole, while with size 0, they completely disappear. The documentation says that size
sets the area of a point. I think an area of 0 should still have stroke, though.
cc @jheer
Hmm, if we set the size of points to less than 1, we get thinner strokes.
I wonder whether there is a way to get filled circles of radius 1 for the smallest circles instead of hollow circles.
@kanitw @jheer any ideas?
I wonder whether there is a way to get filled circles of radius 1 for the smallest circles instead of hollow circles.
Yes, you can add production rule, instead of making scale inaccurate.
Can you show an example in Vega? I can implement it in Vega-Lite afterwards.
Just test if the size field is too small (need some epsilon constant), then apply different size, fill, stroke instead.
Idea: You can even set the color to be different color for non-positive number, if color/fill/stroke is not set. (This is a bit more controversial so I'm not 100% sure if we should do this.)
Something like
{
"$schema": "https://vega.github.io/schema/vega/v3.0.json",
"autosize": "pad",
"padding": 5,
"width": 500,
"height": 500,
"style": "cell",
"data": [
{
"name": "source_0",
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv",
"format": {
"type": "csv",
"parse": {
"precipitation": "number",
"temp_min": "number",
"temp_max": "number"
},
"delimiter": ","
},
"transform": [
{
"type": "filter",
"expr": "datum[\"precipitation\"] !== null && !isNaN(datum[\"precipitation\"]) && datum[\"temp_min\"] !== null && !isNaN(datum[\"temp_min\"]) && datum[\"temp_max\"] !== null && !isNaN(datum[\"temp_max\"])"
}
]
}
],
"marks": [
{
"name": "marks",
"type": "symbol",
"style": ["point"],
"from": {"data": "source_0"},
"encode": {
"update": {
"opacity": {"value": 0.7},
"fill": {
"signal": "scale('size', datum.precipitation) < 1 ? scale('color', datum.weather) : 'transparent'"
},
"stroke": {"scale": "color", "field": "weather"},
"x": {"scale": "x", "field": "temp_min"},
"y": {"scale": "y", "field": "temp_max"},
"size": {
"signal": "scale('size', datum.precipitation) < 1 ? 1 : scale('size', datum.precipitation)"
}
}
}
}
],
"scales": [
{
"name": "x",
"type": "linear",
"domain": {"data": "source_0", "field": "temp_min"},
"range": [0, {"signal": "width"}],
"nice": true,
"zero": true
},
{
"name": "y",
"type": "linear",
"domain": {"data": "source_0", "field": "temp_max"},
"range": [{"signal": "height"}, 0],
"nice": true,
"zero": true
},
{
"name": "color",
"type": "ordinal",
"domain": {"data": "source_0", "field": "weather", "sort": true},
"range": "category"
},
{
"name": "size",
"type": "linear",
"domain": {"data": "source_0", "field": "precipitation"},
"range": [0.1, 361],
"nice": false,
"zero": true
}
],
"axes": [
{
"scale": "x",
"orient": "bottom",
"title": "temp_min",
"labelFlush": true,
"labelOverlap": true,
"tickCount": {"signal": "ceil(width/40)"},
"zindex": 1
},
{
"scale": "x",
"orient": "bottom",
"grid": true,
"tickCount": {"signal": "ceil(width/40)"},
"gridScale": "y",
"domain": false,
"labels": false,
"maxExtent": 0,
"minExtent": 0,
"ticks": false,
"zindex": 0
},
{
"scale": "y",
"orient": "left",
"title": "temp_max",
"labelOverlap": true,
"tickCount": {"signal": "ceil(height/40)"},
"zindex": 1
},
{
"scale": "y",
"orient": "left",
"grid": true,
"tickCount": {"signal": "ceil(height/40)"},
"gridScale": "x",
"domain": false,
"labels": false,
"maxExtent": 0,
"minExtent": 0,
"ticks": false,
"zindex": 0
}
],
"legends": [
{
"stroke": "color",
"title": "weather",
"encode": {"symbols": {"update": {"opacity": {"value": 0.7}}}}
},
{
"size": "size",
"title": "precipitation",
"encode": {"symbols": {"update": {"opacity": {"value": 0.7}}}}
}
],
"config": {"axisY": {"minExtent": 30}}
}
@jakevdp
I think the workaround to set domain to [-1, 50]
is easy but can be a bit inaccurate, since the size ratio won't be preserved. (Data point with value 50 won't have exactly 2x the size of another point with value 25.)
I think it is actually better to set both domain and range with minimum value and clamp the scale
"scale": {"domain": [0.125, 50], "range":[1, 400], "clamp": true, "zero": false}
By setting range to be domain multiply by 8, then we still preserve the ratio, while having minimum size = 1.
Note that
- you shouldn't have to set zero to false, I'm fixing that in https://github.com/vega/vega-lite/pull/3691
- Setting domain
[-1, 50]
is not that bad either since people can't distinguish size very well anyway, but I just feel it's better just write specification that's more accurate.
Full spec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
},
"encoding": {
"color": {
"field": "weather",
"type": "nominal"
},
"size": {
"field": "precipitation",
"type": "quantitative",
"scale": {"domain": [0.125, 50], "range":[1, 400], "clamp": true}
},
"x": {
"field": "temp_min",
"type": "quantitative"
},
"y": {
"field": "temp_max",
"type": "quantitative"
}
},
"mark": "point"
}
There can be a similar issue for bars with very small non-zero value -- we should consider if we want to apply the same trick https://github.com/vega/vega-lite/issues/255.
~~Another potentially better solution is to see if we can augment Vega scales to have a bump scale.~~
Actually bump scale like https://github.com/vega/vega-lite/issues/255 could be slightly different in the sense that users may want to keep zero 0px but have min size for non-zero length.
Actually thinking more, it's probably more maintainable if we just modify the default scale range to include small non zero values. (It'll be slightly less precise than condition encoding, but it will add way less complexity to the system.)
The relevant code is here. https://github.com/vega/vega-lite/blob/next/src/compile/scale/range.ts#L453
The question is what should be the property names for these min size for zero?
Note that we already have config.scale.minBandSize/minStrokeWidth/minFontSize/minSize
to represent min range for size scale when zero
is false
.
So we're adding its counter part to handle min range when zero is true.
Design Decision 1: What's the name for the new config?
a) config.scale.minBandSizeWithZero
b) config.scale.minBandSizeForDomainWithZero
c) config.scale.zeroBandSize
d) config.scale.bandSizeForZero
(Same pattern for strokeWidth/fontSize/size
)
I like c) for brevity.
Design Decision 2: Do we rename existing config for when zero
= false
a) do 1a and rename config.scale.minBandSize
to config.scale.minBandSizeWithoutZero
+ make backward compatible migration (remove old prop from the type, but add a step to migrate)
b) 1b and rename config.scale.minBandSize
to config.scale.minBandSizeForDomainWithoutZero
+ make backward compatible migration
c) do 1c and rename config.scale.minBandSize
to config.scale.minNonZeroBandSize
.
d) Keep existing name -- (Don't introduce breaking change)
I think if we wanna rename c) is probably the best, but honestly it's not worth the breaking change, so probably d).
Design Decision 3: Determine min size for bandSize/strokeWidth/fontSize/size
- I'll defer to @yhoonkim to determine.
cc: @domoritz @yhoonkim Any preference for decision 1-2?
So, first off it's important to know that size encoding is actually not perceptually linear. This means that people cannot compare ratios of sizes. Therefore, I think it's totally okay to have symbol size start at some value.
How about we do 1b) and the new config (minBandSizeForDomainWithZero
) overrides the existing config (minBandSize
) for the case where the domain includes zero? Then we don't break backwards compatibility. If you prefer 1c), I am okay with that but would still not introduce a breaking change to the existing property.