vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

Make it easier to work with offsets (e.g. by adding a jitter mark)

Open joelostblom opened this issue 3 years ago • 3 comments

The new offset channels make it convenient to create charts with categorical offsets, such as this one:

image

However, when trying to use a random jittered offset, I need to use a transform and the default appearance is not an effective visualization since the plots from different categories overlap:

image Open the Chart in the Vega Editor

I have to manually change the domain to make it look decent: image Open the Chart in the Vega Editor

And if I want a jittered and categorical offset, it seems like I need to use a ternary expression for the offsets which is difficult to discover and not convenient for many categories, as well as increasing the chart width manually: image Open the Chart in the Vega Editor

The specs for these operations are quite verbose, particularly the last one:

{
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
  },
  "mark": "point",
  "encoding": {
    "color": {"field": "year", "type": "nominal"},
    "x": {"field": "site", "type": "ordinal"},
    "xOffset": {
      "field": "offset",
      "scale": {"domain": [0, 10]},
      "type": "quantitative"
    },
    "y": {"field": "yield", "type": "quantitative"}
  },
  "transform": [
    {
      "calculate": "datum.year == 1932 ? 5 + random() : 0 + random()",
      "as": "offset"
    }
  ],
  "width": {"step": 40},
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}

It would be convenient if there was a jitter macro / composite mark that made some assumptions and automatically adjusted some of the parameters above so that it was easier to make these types of charts.

joelostblom avatar May 17 '22 03:05 joelostblom

I have a proposal for how a jitter mark macro could work. It is already now possible to achieve this behavior with a text mark since this exposes to different channel for moving the mark in the x-axis, xOffset and dx. This means we can add the random jitter in the mark dx mark parameter and the categorical dodging in the xOffset encoding like so:

{
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
  },
  "mark": {
    "type": "text",
    "size": 25,
    "dx": {"expr": "random() * 5"}
    },
  "encoding": {
    "text": {"datum": "∘", "type": "nominal"},
    "color": {"field": "year", "type": "nominal"},
    "x": {"field": "site", "type": "ordinal"},
    "xOffset": {
      "field": "year",
      "type": "nominal"
    },
    "y": {"field": "yield", "type": "quantitative"}
  },
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}

image

Adding a jitter macro would require adding the dx/dy channels to the point/circle/etc mark types and then include the "dx": {"expr": "random() * 5"} by default. The jitter macro could take an additional parameter to scale the extend of the jitter (the 5 in the previous expression), but other than that I don't think much need to be added. Maybe a parameter that controls if the jitter is in the x or y direction and maybe a parameter that controls the shape of jitter (uniform, swarm, kde, etc), but this could be a later addition. The final spec for a chart like the one above could look like this:

{
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json"
  },
  "mark": {
    "type": "jitter",
    },
  "encoding": {
    "color": {"field": "year", "type": "nominal"},
    "x": {"field": "site", "type": "ordinal"},
    "xOffset": {"field": "year", "type": "nominal"},
    "y": {"field": "yield", "type": "quantitative"}
  },
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}

Any thoughts on this proposal?

joelostblom avatar Sep 22 '23 15:09 joelostblom

let's decouple the issue discussed into two topics

Jittering syntax.

  • I agree currently jitter support is a bit verbose, requiring requires
  1. calculate transform "transform": [{"calculate": "random()", "as": "random"}],

  2. reference to the generate field "yOffset": {"field": "random", "type": "quantitative"}

  • I think jittering is a positional operator rather than a mark type. So one thought would be to introduce a jitter operator somehow.
  • That said, taking a step back, I think if we support inline calculate instead of field, then we can support both jitter and other similar calculations. That said, there is an open question whether this calculate happens before or after aggregation if the encoding block has aggregation.
{
  "data": {"url": "data/cars.json"},
  "height": {"step": 50},
  "mark": "point",
  "encoding": {
    "x": {"field": "Horsepower", "type": "quantitative"},
    "y": {"field": "Cylinders", "type": "ordinal"},
    "yOffset": {"calculate": "random()", "type": "quantitative"}
  }
}

Support both grouping and jittering at the same time

Re: adding dx, given dx and xOffset should pretty much the same thing, it's probably better to analyze why xOffset doesn't work. Conceptually, I could see one may extend xOffset to allow an array of fields for nested offsets. However, it make nesting automatically correct for the nesting will be quite complicated.
(FWIW, it was already super complicated to make x+xOffset grouped bar work if we look at the original xOffset PR.)

Prioritization-wise, I think one can add facet outside instead of doing nested x/yOffset, so the ROI for implementing this (+ the cost from the potential bugs that may cause) is probably low.

kanitw avatar Sep 29 '23 22:09 kanitw

Thanks for your comments @kanitw !

That said, taking a step back, I think if we support inline calculate instead of field, then we can support both jitter and other similar calculations

I think this sounds useful! It sounds like there might be some overlap with the effort to support datum with expressions everywhere as discussed here? I'm thinking an alternative to the calculate transform would be to do something like "yOffset": {"datum": {"expr": "random()"}, "type": "quantitative"},.

Support both grouping and jittering at the same time

I don't know what is the easiest implementation-wise, but I do think it is important to be able to use a categorical offset together with jittered points. Whether that happens through a dx channel or by passing an array to Offset, doesn't matter much to me. One advantage with having a jitter mark would be that a default jitter is already set in the mark, and might be more what people are used to versus passing an Offset array. Although it is possible to facet as a grouping, I think this would still be nice to be able to combine with faceting.

joelostblom avatar Sep 29 '23 23:09 joelostblom