Gadfly.jl icon indicating copy to clipboard operation
Gadfly.jl copied to clipboard

[WIP] Density geometry revamp

Open tlnagy opened this issue 6 years ago • 10 comments

Fixes #1152

Contributor checklist:

  • [x] I've updated the documentation to reflect these changes
  • [ ] I've added an entry to NEWS.md
  • [ ] I've added and/or updated the unit tests
  • [ ] I've run the regression tests
  • [ ] I've squash'ed or fixup'ed junk commits with git-rebase
  • [x] I've built the docs and confirmed these changes don't cause new errors

Implemented changes

  • This PR introduces a bunch of new features for Geom.density: custom kernels, custom scaling of densities, stacking, trimmed distributions, adjustments of the computed bandwidth, vertical and horizontal orientations
  • Geom.violin also gets all these features plus the ability to create split violins and ~~group by color~~ (EDIT: this is bigger than this PR) as well
  • Geom.violin and Geom.density now share the same DensityStatistic. ~~There is still some ugliness here because I infer whether the user wants a density or violin plot based on what variables are initialized in the aesthetic. I'm not wedded to this approach.~~ EDIT: I now rely on a manual setting on orientation
  • This PR also introduces a general way for grouping by color and by category. I intend to leverage it for Geom.boxplot and maybe even Geom.bar.

Here's a little taste of what's going to be possible soon:

using Distributions, DataFrames, Gadfly
srand(123);
data = vcat(rand(Beta(2,5), 10), 1-rand(Beta(2,5), 10), 0.25+rand(Beta(2,5), 10));
types = repeat([:one, :two], outer=15);
group = vcat(fill(1, 10), fill(0.5, 10), fill(2, 10)); df = DataFrame(data=data, group=group); p = plot(df, x=:group, y=:data, color=types, Geom.violin(split=true, trim=false))

image

TODO

  • [x] Rebuild render() for Geom.density
  • [x] Flesh out edge cases for Geom.violin render()
  • [x] Add support for conditional density estimates

tlnagy avatar May 28 '18 04:05 tlnagy

We now have the ability to generate conditional density distributions!

using RDatasets, Gadfly
df = dataset("ggplot2", "diamonds")
plot(df, x=:Carat, color=:Cut, Geom.density(scale=:count, position=:fill))

image

@Mattriks and @bjarthur do you all know how I can adjust the x and y extents from an apply_statistics function? I would like to trim off the excess space in the above plot somehow.

tlnagy avatar Jun 01 '18 01:06 tlnagy

https://github.com/GiovineItalia/Gadfly.jl/issues/781 is related

bjarthur avatar Jun 02 '18 11:06 bjarthur

There's a bug in KernelDensity.jl that will sometimes cause this PR to fail. It would probably be best if my fix (https://github.com/JuliaStats/KernelDensity.jl/pull/52) is merged and tagged prior to merging this PR.

tlnagy avatar Jun 03 '18 22:06 tlnagy

looking forward to having this functionality! in general it looks great.

my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions, or whether Stat.density could be combined with Geom.line, and Geom.density and Geom.violin could simply be aliases.

sorry for all the hassle resolving the conflicts with #1116.

bjarthur avatar Jun 05 '18 11:06 bjarthur

also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin? if not, depwarns are warranted. and we should think about whether to merge it before or after fixing https://github.com/GiovineItalia/Gadfly.jl/issues/1130

bjarthur avatar Jun 05 '18 11:06 bjarthur

Thanks for the review. I won't have time to fix this for a bit and I want to make sure we get it right before merging.

also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin?

AFAIK, there shouldn't be any regressions or changes to the API. There are however some changes to defaults that I think are better than the current ones. The primary change is that density plots by default now clip to the range of the data (this the same as ggplot2) since it doesn't make sense to predict outside that range.

my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions

This might be achievable. The primary difficulty is that these two geoms are grouped in very different ways.

  • Density plots are generally overlaid or stacked on top of each other (varying color), only x and y aesthetics are used.
  • Violin plots are generally dodged (or split) according to group and color, x, y, and width aesthetics are used.

They are both essentially Compose.polygons, but the offsets and spacing are different enough that gets a little hairy. If anything Geom.boxplot and Geom.violin are probably the most likely to share code.

I think in general we should think about separating out the positioning code into structs because a lot of it is boilerplate. The same positioning code should work for bars, boxplots, violins, etc. This is what ggplot2 does. See http://ggplot2.tidyverse.org/reference/index.html#section-layer-position-adjustment for details

tlnagy avatar Jun 07 '18 00:06 tlnagy

do you want to merge this before or after we support 0.7? there will likely be merge conflicts if we wait.

bjarthur avatar Jun 30 '18 12:06 bjarthur

I don't have time right now to revamp this PR so I'm okay with letting it get a bit stale and then finalize it a bit later. There are enough small edge cases that I need to handle that it's better to sit on this till it's ready.

tlnagy avatar Jul 01 '18 18:07 tlnagy

Split violin plot is a very useful function and it seems it's still not part of Gadfly. Any chance of possibility to revamp this PR? Let me know if I could help somehow.

wizofe avatar Feb 06 '22 12:02 wizofe

agreed! @tlnagy ??

bjarthur avatar Feb 06 '22 13:02 bjarthur