vega-lite
vega-lite copied to clipboard
Facet sort order not applied in presence of an aggregate
The facet sort order works as expected with non-aggregated data (editor):
{
"data": {"url": "data/cars.json"},
"mark": "bar",
"encoding": {
"row": {
"type": "nominal",
"field": "Origin",
"sort": ["Japan", "Europe", "USA"]
},
"x": {"type": "quantitative", "field": "Horsepower"}
}
}
But if an aggregation is added to the x-axis, the order is no longer respected (editor):
{
"data": {"url": "data/cars.json"},
"mark": "bar",
"encoding": {
"row": {
"type": "nominal",
"field": "Origin",
"sort": ["Japan", "Europe", "USA"]
},
"x": {"type": "quantitative", "field": "Horsepower", "aggregate": "mean"}
}
}
Reported in https://github.com/altair-viz/altair/issues/1683
Are there any updates on this issue?
Context: altair-viz/altair#3386
Vega editor links with barley yield example dataset faceted bar charts with attempted ordering: non-aggregate || aggregate
As @joelostblom suggested I've had a look at how the Vega spec is produced for both.
Intended sort order:
"sort": [ "Waseca", "Morris", "University Farm",
"Grand Rapids", "Crookston", "Duluth" ],
Observations
tl;dr: sort order is defined in both aggregate and non-aggregate charts (here, column_site_sort_index
), but anything that tries to use that is disregarded in the aggregate version
The sort order appears to come in a data reference block (I don't know the actual term!) name
'd data_0
starting at line 616, being a formula
type, which sets the intended ordering by site:
{
"name": "data_0",
"source": "data-093ece8c35bb2d41094cfb6138ec810b",
"transform": [
{
"type": "formula",
"expr": "datum[\"site\"]===\"Waseca\" ? 0 : datum[\"site\"]===\"Morris\" ? 1 : datum[\"site\"]===\"University Farm\" ? 2 : datum[\"site\"]===\"Grand Rapids\" ? 3 : datum[\"site\"]===\"Crookston\" ? 4 : datum[\"site\"]===\"Duluth\" ? 5 : 6",
"as": "column_site_sort_index"
},
The block differs - the non-aggregate version defines a stack
, the aggregate version defines an aggregate
. The non-aggregate version includes a sort
object with an empty definition: "sort": {"field": [], "order": []},
which seems to have no effect on the rendered chart.
The next block is named column_domain
, which seems to have an effect on the ordering of the labels in the non-aggregate version- changing ops: ["max"]
to ops: ["exponential"]
at line 647 for example changes the labels to "unordered" version, while the columns themselves do not change:
https://github.com/vega/vega-lite/assets/7524620/2151d6f7-0b9e-4db3-80df-851ef8823310
It seems to have no effect on the non-aggregate version as that is already 'unordered'.
Similarly, the column_header
block shows a changing of the order of the labels (from the name, column headers?) when changed from eg ascending to descending order
at line 699 only for the non-aggregate version:
https://github.com/vega/vega-lite/assets/7524620/2b876eaa-3993-4c99-9820-a3eaf86dca51
The sort
order property at line 752 affects the ordering of columns, this time only the bars as opposed to the headings:
https://github.com/vega/vega-lite/assets/7524620/5746c437-2cfb-43ff-be2f-3fc99a3a542f
Theory
Not being familiar with Vega, the following is a guess as to why the behaviour is as it seems:
Is Vega-Lite producing a Vega spec that defines a sort order for the underlying dataset, rather than the produced aggregate?
I am speculating based on: 1) the order in which the blocks appear in the Vega spec and 2) that the source
s to which the ordering transform
is applied have the same identifier that looks like a hash: data-093ece8c35bb2d41094cfb6138ec810b
.
So that source is sorted; and the non-aggregate version uses that sorted source (see marks
at 760, using yield_end
and yield_start
), whereas the aggregate version obviously uses the aggregate (["sum_yield"]
).
This might be a red herring as both seem to use the same facet definition:
"from": {
"facet": {
"name": "facet",
"data": "data_0",
"groupby": ["site"],
"aggregate": {
"fields": ["column_site_sort_index"],
"ops": ["max"],
"as": ["column_site_sort_index"]
}
}
However, if I remove the aggregate
block and replace it with stack
in the data block, and then change both the y
and y2
definitions in marks
, as well as the y scale definition, I get the correct ordering... albeit by fundamentally changing the chart!