seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Add box mark

Open mwaskom opened this issue 2 years ago • 3 comments

This is a rectangular mark drawn between min/max values, sort of a cross between Bar and Range:

(
    so.Plot(tips, "day", "total_bill", color="sex")
    .add(so.Box(), so.Dodge())
)

I think it's very likely that there will be a standalone mark for drawing a full box-and-whisker plot in one layer — name and implementation strategy TBD — but until then this mark provides what we need to have an (outlier-less) "box plot":

(
    so.Plot(tips, "day", "total_bill", color="sex")
    .add(so.Box(alpha=.35), so.Perc([25, 75]), so.Dodge())
    .add(so.Range(), so.Perc([0, 25]), so.Dodge())
    .add(so.Range(), so.Perc([75, 100]), so.Dodge())
    .add(so.Dash(), so.Perc([50]), so.Dodge())
)

Couple notes:

  • Took a long time to decide on the name, considering other options such as Span, Rect, Plank, Beam, Lath. The first two are more useful for other planned marks, and the latter 3 are not obvious. This does commit us to using Boxplot or WhiskerBox or something like that for the proper "box plot" mark.
  • This uses the edge-clipping trick that Bar and Area marks use. I'm souring on this a bit and may revert it in a future release.

mwaskom avatar Nov 07 '22 01:11 mwaskom

Codecov Report

Merging #3127 (76a5abb) into master (d4d27ad) will increase coverage by 0.00%. The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #3127    +/-   ##
========================================
  Coverage   98.41%   98.42%            
========================================
  Files          76       76            
  Lines       24076    24190   +114     
========================================
+ Hits        23695    23809   +114     
  Misses        381      381            
Impacted Files Coverage Δ
seaborn/_marks/area.py 96.20% <100.00%> (-0.31%) :arrow_down:
seaborn/_marks/bar.py 100.00% <100.00%> (ø)
seaborn/_marks/base.py 98.51% <100.00%> (+0.11%) :arrow_up:
seaborn/objects.py 100.00% <100.00%> (ø)
tests/_marks/test_bar.py 100.00% <100.00%> (ø)

codecov[bot] avatar Nov 07 '22 02:11 codecov[bot]

Observable Plot does this with the regular bar mark, but they have x1 x2 and y1 y2 position options. Vega-lite also has x x1 and y y1 position options.

That doesn't mean that Seaborn Objects should do the same, but it is interesting to see how other grammar of graphics libraries handle it.

jcmkk3 avatar Nov 07 '22 02:11 jcmkk3

The main practical difference would be the loss of the range calculation when xmin/xmax aren't specified; so that first example would become:

(
    so.Plot(tips, "day", "total_bill", color="sex")
    .add(so.Bar(), so.Perc([25, 75]), so.Dodge())
)

I do wonder a little bit whether the Box mark should have a median line when it computes a range (but should it have one when there are only two values, as in the Perc example? That's probably too hard to explain / predict).

One thing that would be a little awkward to do with this parameterization would be a candlestick chart when your data naturally has "open" and "close" columns. Having a y/y1 or y1/y2 parameterization is a little less opinionated there compared to ymin/ymax (although actually, given the way the mark is implemented, you sill get the expected result:

so.Plot(["a", "b"], ymin=[1, 2], ymax=[2, -1]).add(so.Box())

mwaskom avatar Nov 07 '22 13:11 mwaskom