vega-lite
vega-lite copied to clipboard
fix: facet missing data bug fix
PR Description
This is a suggested fix for https://github.com/vega/vega-lite/issues/5937 and potentially also https://github.com/vega/vega-lite/issues/8675 and https://github.com/vega/altair/issues/3588 and https://github.com/vega/altair/issues/3481
I included two examples demonstrating the bug. For instance the spec
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/cars.json"},
"facet": {
"column": {"field": "Origin", "sort": ["USA", "Europe", "Japan"]},
"row": {"field": "Cylinders", "sort": {"field": "Horsepower", "op": "mean"}}
},
"spec": {
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Acceleration", "type": "quantitative"},
"color": {"field": "Origin", "type": "nominal"},
"shape": {"field": "Cylinders", "type": "nominal"}
}
}
}
results in
while it should look like this:
I think the bug is due to the following. The key part of the compiled Vega spec looks like
{
"name": "cell",
"type": "group",
"style": "cell",
"from": {
"facet": {
"name": "facet",
"data": "source_0",
"groupby": ["Cylinders", "Origin"],
"aggregate": {
"cross": true,
"fields": [
"mean_Horsepower_by_Cylinders",
"column_Origin_sort_index"
],
"ops": ["max", "max"],
"as": ["mean_Horsepower_by_Cylinders", "column_Origin_sort_index"]
}
}
},
}
As far as I understand this performs a groupby
according to cylinders and origin, and then for each group determines mean_Horsepower_by_Cylinders
and column_Origin_sort_index
used for sorting by aggregating over the data points in the facet and taking the max (it does not matter as the column sort index is the same for all points in the facet anyways, so one could also take e.g. the min). The problem arises when a facet is empty as then the aggregation cannot determine the valid sort index. As a result, some facets are not assigned a sort-index, see this screenshot from the Vega Editor -> Data Viewer -> cell:
For some reason the cells with missing sort-index result in some data points being placed in the wrong facet (not exactly sure what goes wrong here exactly).
My bug fix suggestion simply replaces the above spec by
{
"name": "cell",
"type": "group",
"style": "cell",
"from": {
"facet": {
"name": "facet",
"data": "source_0",
"groupby": [
"mean_Horsepower_by_Cylinders",
"column_Origin_sort_index"
],
"aggregate": {"cross": true}
}
},
}
i.e. it directly groups by the sort index, avoiding cells without assigned indices. This is simpler than the original code. For all examples I looked at, it worked. However, maybe I overlook some unintended side effects.
Out of all the tests a single test in test/compile/facet.test.ts
fails, namely line 507. As far as I can see this test directly checks the section of the compiled Vega spec outlined above, so in some sense I would expect it to fail.
Checklist
- [x] This PR is atomic (i.e., it fixes one issue at a time).
- [x] The title is a concise semantic commit message (e.g. "fix: correctly handle undefined properties").
- [x]
yarn test
runs successfully