Make it easier to work with offsets (e.g. by adding a jitter mark)
The new offset channels make it convenient to create charts with categorical offsets, such as this one:

However, when trying to use a random jittered offset, I need to use a transform and the default appearance is not an effective visualization since the plots from different categories overlap:
Open the Chart in the Vega Editor
I have to manually change the domain to make it look decent:
Open the Chart in the Vega Editor
And if I want a jittered and categorical offset, it seems like I need to use a ternary expression for the offsets which is difficult to discover and not convenient for many categories, as well as increasing the chart width manually:
Open the Chart in the Vega Editor
The specs for these operations are quite verbose, particularly the last one:
{
"data": {
"url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
},
"mark": "point",
"encoding": {
"color": {"field": "year", "type": "nominal"},
"x": {"field": "site", "type": "ordinal"},
"xOffset": {
"field": "offset",
"scale": {"domain": [0, 10]},
"type": "quantitative"
},
"y": {"field": "yield", "type": "quantitative"}
},
"transform": [
{
"calculate": "datum.year == 1932 ? 5 + random() : 0 + random()",
"as": "offset"
}
],
"width": {"step": 40},
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}
It would be convenient if there was a jitter macro / composite mark that made some assumptions and automatically adjusted some of the parameters above so that it was easier to make these types of charts.
I have a proposal for how a jitter mark macro could work. It is already now possible to achieve this behavior with a text mark since this exposes to different channel for moving the mark in the x-axis, xOffset and dx. This means we can add the random jitter in the mark dx mark parameter and the categorical dodging in the xOffset encoding like so:
{
"data": {
"url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
},
"mark": {
"type": "text",
"size": 25,
"dx": {"expr": "random() * 5"}
},
"encoding": {
"text": {"datum": "∘", "type": "nominal"},
"color": {"field": "year", "type": "nominal"},
"x": {"field": "site", "type": "ordinal"},
"xOffset": {
"field": "year",
"type": "nominal"
},
"y": {"field": "yield", "type": "quantitative"}
},
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}
Adding a jitter macro would require adding the dx/dy channels to the point/circle/etc mark types and then include the "dx": {"expr": "random() * 5"} by default. The jitter macro could take an additional parameter to scale the extend of the jitter (the 5 in the previous expression), but other than that I don't think much need to be added. Maybe a parameter that controls if the jitter is in the x or y direction and maybe a parameter that controls the shape of jitter (uniform, swarm, kde, etc), but this could be a later addition. The final spec for a chart like the one above could look like this:
{
"data": {
"url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
},
"mark": {
"type": "jitter",
},
"encoding": {
"color": {"field": "year", "type": "nominal"},
"x": {"field": "site", "type": "ordinal"},
"xOffset": {"field": "year", "type": "nominal"},
"y": {"field": "yield", "type": "quantitative"}
},
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}
Any thoughts on this proposal?
let's decouple the issue discussed into two topics
Jittering syntax.
- I agree currently jitter support is a bit verbose, requiring requires
-
calculate transform
"transform": [{"calculate": "random()", "as": "random"}], -
reference to the generate field
"yOffset": {"field": "random", "type": "quantitative"}
- I think
jitteringis a positional operator rather than a mark type. So one thought would be to introduce a jitter operator somehow. - That said, taking a step back, I think if we support inline
calculateinstead offield, then we can support both jitter and other similar calculations. That said, there is an open question whether this calculate happens before or after aggregation if the encoding block has aggregation.
{
"data": {"url": "data/cars.json"},
"height": {"step": 50},
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Cylinders", "type": "ordinal"},
"yOffset": {"calculate": "random()", "type": "quantitative"}
}
}
Support both grouping and jittering at the same time
Re: adding dx, given dx and xOffset should pretty much the same thing, it's probably better to analyze why xOffset doesn't work. Conceptually, I could see one may extend xOffset to allow an array of fields for nested offsets. However, it make nesting automatically correct for the nesting will be quite complicated.
(FWIW, it was already super complicated to make x+xOffset grouped bar work if we look at the original xOffset PR.)
Prioritization-wise, I think one can add facet outside instead of doing nested x/yOffset, so the ROI for implementing this (+ the cost from the potential bugs that may cause) is probably low.
Thanks for your comments @kanitw !
That said, taking a step back, I think if we support inline calculate instead of field, then we can support both jitter and other similar calculations
I think this sounds useful! It sounds like there might be some overlap with the effort to support datum with expressions everywhere as discussed here? I'm thinking an alternative to the calculate transform would be to do something like "yOffset": {"datum": {"expr": "random()"}, "type": "quantitative"},.
Support both grouping and jittering at the same time
I don't know what is the easiest implementation-wise, but I do think it is important to be able to use a categorical offset together with jittered points. Whether that happens through a dx channel or by passing an array to Offset, doesn't matter much to me. One advantage with having a jitter mark would be that a default jitter is already set in the mark, and might be more what people are used to versus passing an Offset array. Although it is possible to facet as a grouping, I think this would still be nice to be able to combine with faceting.