altair icon indicating copy to clipboard operation
altair copied to clipboard

Move shared encodings of layered charts to top level?

Open joelostblom opened this issue 2 years ago • 2 comments

I opened this as a feature request, but maybe it is more of a discussion as a follow up to https://github.com/altair-viz/altair/pull/2991. The overall sentiment is whether Altair should simplify redundancy in layered specs by moving common encodings to the top level of the Chart instead of repeating them in each layer. I think this is low priority as it would not change any functionality when working in Altair, just the verbosity of the VL spec.

Currently, if we create an Altair spec like this, the code suggests that the encodings are part of the top level chart (base) and shared between the marks:

import altair as alt
from vega_datasets import data

base = alt.Chart(data.wheat.url).encode(
    x='wheat:Q',
    y="year:O",
    text='wheat:Q'
)
base.mark_bar() + base.mark_text(align='left', dx=1)

However, what actually happens is that the encodings are redundantly copied into the mark of each layer in the Vega-Lite spec:

{
  "layer": [
    {
      "mark": {"type": "bar"},
      "encoding": {
        "text": {"field": "wheat", "type": "quantitative"},
        "x": {"field": "wheat", "type": "quantitative"},
        "y": {"field": "year", "type": "ordinal"}
      }
    },
    {
      "mark": {"type": "text", "align": "left", "dx": 1},
      "encoding": {
        "text": {"field": "wheat", "type": "quantitative"},
        "x": {"field": "wheat", "type": "quantitative"},
        "y": {"field": "year", "type": "ordinal"}
      }
    }
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/wheat.json"
  }
}

A more suitable translation would be to include a top-level encoding that can be shared by all the layered marks. This leads to a less verbose Vega-Lite spec that is more semantically similar to how the code is written in Altair:

{
  "encoding": {
    "text": {"field": "wheat", "type": "quantitative"},
    "x": {"field": "wheat", "type": "quantitative"},
    "y": {"field": "year", "type": "ordinal"}
  },
  "layer": [
    {"mark": {"type": "bar"}},
    {"mark": {"type": "text", "align": "left", "dx": 1}}
  ],
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/wheat.json"
  }
}

That VL spec can actually be generated in Altair if we use LayerChart directly:

alt.LayerChart(
    data=data.wheat.url,
    encoding=alt.SharedEncoding(
        x=alt.X('wheat:Q'),
        y=alt.Y('year:O'),
        text=alt.Text('wheat:Q')
    ),
    layer=[
        alt.Chart().mark_bar(),
        alt.Chart().mark_text(align='left', dx=2)
    ]
    
)

I wonder if Altair should try to figure out which encodings are shared between layers and can be moved to the top-level? There is not that much practical benefit from this as the encodings in Altair's grammar can still be thought of conceptually as belonging to the chart and shared by each mark that is added to the chart (even when the code is written as in the first example, it is just that this is not how it works under the hood). I am not sure if there are situations where this automatic move would be undesired.

joelostblom avatar Mar 27 '23 19:03 joelostblom

As a user, I really like this syntax. As a user, this discussion may not require my opinion.

mcp292 avatar Mar 27 '23 19:03 mcp292

A bit related and a bit off-topic, but interesting nonetheless: https://github.com/queryverse/VegaLite.jl/pull/411.

mattijn avatar Apr 27 '23 19:04 mattijn