vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

Boxplots with many identical values or just one value are missing the median line

Open joelostblom opened this issue 2 years ago • 8 comments

For datasets with many identical values, it is understandable that there is no box drawn for q1 and q3, but the median line at q2 should always be present. Currently only outliers are shown which is confusing since it gives the indication that the dataset only contains a few observations, rather than potentially many observations compressed at the same value.

image Open the Chart in the Vega Editor

My expectation would be to see the chart like it is shown in seaborn: image

joelostblom avatar Apr 25 '22 23:04 joelostblom

On a closer look, it appears that the median line is actually there, but since it is drawn in white, it is invisible unless the colored box is present or the chart background is dark:

image

It would be nice to add some logic that changes the color of this line to the color of the box/outliers when the box is not present (so blue in this case). This could be thought of as compressing the box (q1 and q3) to a line at q2 and draw it on top of the median line.

Maybe also increasing the thickness slightly to 2 (only when there is no box), leading to this appearance, which I think makes it clear what is going on:

image

joelostblom avatar Apr 26 '22 00:04 joelostblom

Ahh, good catch. The tricky bit is that Vega-Lite never sees the data so we have to build the logic in Vega spec.

domoritz avatar Apr 26 '22 03:04 domoritz

Another scenario where the current behavior makes it hard to detect the median, is if it is the same as one of the quartiles as in this case:

image

An alternative to introducing logic for these special cases on the Vega side of things would be to change the default median line to a black thicker line (the same grey as the whiskers is hard to see):

image

This doesn't look quite as great as white in most cases, but it does solve both the edge cases I have reported here.

image

image Open the Chart in the Vega Editor

Another example:

image

image

joelostblom avatar Apr 30 '22 22:04 joelostblom

Could we add a colored outline around a white line?

domoritz avatar Apr 30 '22 23:04 domoritz

I tried that a little before, but it was difficult to get the top and bottom of the outline flush with the box, since the corners seem to be a bit rounded regardless of the cap style I choose:

image

If you are OK with the median line being contained within the box (rather than the current appearance of splitting the box in two), then I think it can work:

image Open the Chart in the Vega Editor (not sure if some might consider it incorrect that a bit of the box seems to stick out under the median due to the outline although their value is exactly the same, this is a very small difference though).

image Open the Chart in the Vega Editor

joelostblom avatar May 01 '22 01:05 joelostblom

I'll defer to @kanitw who might have a better idea.

domoritz avatar May 03 '22 22:05 domoritz

Boxplot with just one value also suffers from this problem.

I think another option to consider is to do conditional encoding (don't use white color if max === median === max)?

kanitw avatar Nov 14 '23 01:11 kanitw

Yes, that sounds like a good alternative too

joelostblom avatar Nov 14 '23 02:11 joelostblom