mosaic
mosaic copied to clipboard
Axis scale includes null value counts
When plotting a binned bar chart, I noticed that the y axis scale is affected by the presence of nulls. Is this expected?
For example, the following chart does not include null values and the y axis has a max of 1:
await vg.coordinator().exec(vg.loadObjects("testData", [{ colA: 1 }, { colA: 2 }]));
vg.plot(
vg.rectY(vg.from("testData"), {
x: vg.bin("colA"),
y: vg.count(),
inset: 0.5,
}),
vg.height(200)
);
Whereas this data has null values and the y axis max is 6
await vg.coordinator()
.exec(
vg.loadObjects("testData", [
{ colA: 1 },
{ colA: 2 },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
])
);
vg.plot(
vg.rectY(vg.from("testData"), {
x: vg.bin("colA"),
y: vg.count(),
inset: 0.5,
}),
vg.height(200)
);
I would expect these two plots to look the same, however it seems the presence of nulls is affecting the axis scale. One workaround I've found is to first create a view that filters out nulls before plotting but curious if this should not be the default behavior?
One follow up comment to this is that I quite like that the default behavior for bars is to actually plot the nulls:
await vg
.coordinator()
.exec(
vg.loadObjects("testData", [
{ colA: 1 },
{ colA: 2 },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
{ colA: null },
])
);
return vg.vconcat(
vg.plot(
vg.barX(vg.from("testData"), {
x: vg.count(),
y: "colA",
order: "colA",
}),
vg.height(200)
)
);
This makes more sense to me than the example above since the nulls are actually plotted. Is it reasonable to expect nulls to be plotted for bars but filtered out when they are not actually included in the plot?
The aggregate (bin/count) query must be returning an entry corresponding to the null values. I'm guessing the resulting x1
and x2
values that map to the x-axis are null
. As a result, Observable Plot does not draw a corresponding bar but does include the count
in the axis scale determination.
The question is where we might want null filtering/suppression to kick in. Should the underlying query suppress nulls? In general I don't think so, as your barX
example suggests. But we might try to do this as part of the semantics of the bin
transform? Or via an explicit push-down null filter option? Or something else?
I think it would be nice to have a separate bar for nulls or an indicator how many nulls there are nulls in a histogram similar to tableau. Then the right things would probably be to remove the nulls before passing to plot but still query for them.
@domoritz we can already do this with an ordinal domain, but with rectX/Y here we have a solely continuous domain. But the bin transform should already work with barX/Y for an ordinal scale including nulls.
I'm thinking of something like https://vega.github.io/vega/examples/histogram-null-values/.