Gadfly.jl
Gadfly.jl copied to clipboard
[WIP] Density geometry revamp
Fixes #1152
Contributor checklist:
- [x] I've updated the documentation to reflect these changes
- [ ] I've added an entry to
NEWS.md
- [ ] I've added and/or updated the unit tests
- [ ] I've run the regression tests
- [ ] I've
squash
'ed orfixup
'ed junk commits with git-rebase - [x] I've built the docs and confirmed these changes don't cause new errors
Implemented changes
- This PR introduces a bunch of new features for
Geom.density
: custom kernels, custom scaling of densities, stacking, trimmed distributions, adjustments of the computed bandwidth, vertical and horizontal orientations -
Geom.violin
also gets all these features plus the ability to create split violins and ~~group by color~~ (EDIT: this is bigger than this PR) as well -
Geom.violin
andGeom.density
now share the sameDensityStatistic
. ~~There is still some ugliness here because I infer whether the user wants adensity
orviolin
plot based on what variables are initialized in the aesthetic. I'm not wedded to this approach.~~ EDIT: I now rely on a manual setting onorientation
- This PR also introduces a general way for grouping by color and by category. I intend to leverage it for
Geom.boxplot
and maybe evenGeom.bar
.
Here's a little taste of what's going to be possible soon:
using Distributions, DataFrames, Gadfly
srand(123);
data = vcat(rand(Beta(2,5), 10), 1-rand(Beta(2,5), 10), 0.25+rand(Beta(2,5), 10));
types = repeat([:one, :two], outer=15);
group = vcat(fill(1, 10), fill(0.5, 10), fill(2, 10)); df = DataFrame(data=data, group=group); p = plot(df, x=:group, y=:data, color=types, Geom.violin(split=true, trim=false))
TODO
- [x] Rebuild
render()
forGeom.density
- [x] Flesh out edge cases for
Geom.violin
render()
- [x] Add support for conditional density estimates
We now have the ability to generate conditional density distributions!
using RDatasets, Gadfly
df = dataset("ggplot2", "diamonds")
plot(df, x=:Carat, color=:Cut, Geom.density(scale=:count, position=:fill))
@Mattriks and @bjarthur do you all know how I can adjust the x and y extents from an apply_statistics
function? I would like to trim off the excess space in the above plot somehow.
https://github.com/GiovineItalia/Gadfly.jl/issues/781 is related
There's a bug in KernelDensity.jl
that will sometimes cause this PR to fail. It would probably be best if my fix (https://github.com/JuliaStats/KernelDensity.jl/pull/52) is merged and tagged prior to merging this PR.
looking forward to having this functionality! in general it looks great.
my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions, or whether Stat.density could be combined with Geom.line, and Geom.density and Geom.violin could simply be aliases.
sorry for all the hassle resolving the conflicts with #1116.
also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin? if not, depwarns are warranted. and we should think about whether to merge it before or after fixing https://github.com/GiovineItalia/Gadfly.jl/issues/1130
Thanks for the review. I won't have time to fix this for a bit and I want to make sure we get it right before merging.
also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin?
AFAIK, there shouldn't be any regressions or changes to the API. There are however some changes to defaults that I think are better than the current ones. The primary change is that density plots by default now clip to the range of the data (this the same as ggplot2
) since it doesn't make sense to predict outside that range.
my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions
This might be achievable. The primary difficulty is that these two geoms are grouped in very different ways.
- Density plots are generally overlaid or stacked on top of each other (varying color), only
x
andy
aesthetics are used. - Violin plots are generally dodged (or split) according to group and color,
x
,y
, andwidth
aesthetics are used.
They are both essentially Compose.polygons
, but the offsets and spacing are different enough that gets a little hairy. If anything Geom.boxplot
and Geom.violin
are probably the most likely to share code.
I think in general we should think about separating out the positioning code into structs because a lot of it is boilerplate. The same positioning code should work for bars, boxplots, violins, etc. This is what ggplot2
does. See http://ggplot2.tidyverse.org/reference/index.html#section-layer-position-adjustment for details
do you want to merge this before or after we support 0.7? there will likely be merge conflicts if we wait.
I don't have time right now to revamp this PR so I'm okay with letting it get a bit stale and then finalize it a bit later. There are enough small edge cases that I need to handle that it's better to sit on this till it's ready.
Split violin plot is a very useful function and it seems it's still not part of Gadfly
. Any chance of possibility to revamp this PR? Let me know if I could help somehow.
agreed! @tlnagy ??