StatsMakie.jl icon indicating copy to clipboard operation
StatsMakie.jl copied to clipboard

Legend for grouped data

Open grero opened this issue 6 years ago • 9 comments

I am trying to add a legend for grouped data. This example works for simple line plots:

https://simondanisch.github.io/ReferenceImages/gallery//legend_1/index.html

but I am not sure how to modify this to display legends for a plot generated e.g. like this:

using StatsMakie, Makie, Random
x = randn(1000);
g = rand(1:3, 1000);
scene = plot(density, Group(g), x)

grouped_density_test

grero avatar May 16 '19 07:05 grero

Legend support is one of the big TODOs of StatsMakie. The snippet you link to is actually very helpful. I imagine one should create some helper function in StatsMakie that, after grouping, returns the settings needed for the legend. If you inspect scene[end] you should be able to find the colors that were used:

julia> [scene[end].plots[i].color[] for i in 1:3]
3-element Array{RGB{Float64},1} with eltype ColorTypes.RGB{Float64}:
 RGB{Float64}(0.9019607843137255,0.6235294117647059,0.0)                
 RGB{Float64}(0.33725490196078434,0.7058823529411765,0.9137254901960784)
 RGB{Float64}(0.0,0.6196078431372549,0.45098039215686275)  

So I suspect we need a helper function that looks into this and creates a legend using the colors it finds (and the other attributes that were changed by group). I have a doubt: if I'm grouping by two things, should we have two separate legends or a common one with all the combinations?

piever avatar May 21 '19 15:05 piever

I'd say you should probably have a common legend, seems like the easiest to implement

asinghvi17 avatar May 21 '19 15:05 asinghvi17

I'd say two separate, much clearer

mkborregaard avatar May 21 '19 15:05 mkborregaard

Fair enough - the question is, how that can be implemented with Makie's legend interface.

asinghvi17 avatar May 21 '19 16:05 asinghvi17

I played around with this a little bit. I would like to share my attempt for anybody who wants to continue working on this. It works for simple examples, but it is very hacky. I just don't know the internals good enough to improve the code.

First, generate some data.

## grouped cross sectional data
x = [i for i in 1:NG for j in 1:NpG]
y = randn(N)
g1 = ["foo $i" for i in 1:NG1 for j in 1:NpG * NG1]
g2 = ["bar $i" for j in 1:NG1 for i in 1:NG1 for k in 1:NpG]

## grouped time series data
using Random
ts = vec(cumsum(randn(N, 9), dims=1))
g01 = ["foo $i" for i in 1:3 for j in 1:N for k in 1:3]
g02 = ["bar $i" for j in 1:3 for i in 1:3 for k in 1:N]

For simple grouping, the everything works out of the box.

scn1 = scatter(Group(g1), x, y, markersize=0.5)
leg1 = legend(scn1[end].plots, unique((g1)), camera=campixel!, raw=true)
vbox(scn1, leg1)

Auswahl_001

For more than one grouping variable use this hacky function.

_is_group(x) = typeof(x.val) <: Group
_is_xyz(x) = typeof(x.val) <: Array

function grouped_legend!(scene)
  input_args = scene.plots[end].input_args
  ## Get group info
  i_grp = findfirst(_is_group.(input_args))
  grp = input_args[i_grp].val.columns
  ## Get data
  xyz = input_args[collect(_is_xyz.(input_args))]

  # For each grouping variable generate a separate legend
  # and combine it using hbox
  leg = mapreduce(hbox, zip(keys(grp), grp)) do n_grp
    nt = NamedTuple{(n_grp[1],)}((n_grp[2],))
    # Generate plot with just one grouping variable
    # (Case distinction for lines vs scatter)
    plt_type = typeof(scene[end].plots[1])
    if plt_type <: Lines
      scn = lines(Group(nt), xyz...)
    elseif plt_type <: Scatter
      scn = scatter(Group(nt), xyz...)
    else
      @error("TODO: $plt_type not yet covered")
    end
    labels = unique(nt[1])
    # Generate legend
    leg = legend(scn[end].plots, labels, camera=campixel!, raw=true)
  end

  vbox(scene, leg)
end
scn2 = scatter(Group(marker=g1, color=g2), x, y, markersize=0.5)
grouped_legend!(scn2)

Auswahl_002

scene = lines(Group(color=g01, linestyle=g02), ts)
grouped_legend!(scene)

Auswahl_003

Todo

  • [ ] legends for continuous styling (cf zcolor in Plots.jl) As far as styling by continuous variables is concerned, coloring is easy (just add a colorlegend). A legend for markersize is harder, I guess. ggplot just uses a few bubbles as examples.

  • [ ] positioning

  • [ ] descriptions for each legend (titles)

greimel avatar Jul 10 '19 08:07 greimel

This raises an interesting question - currently, the legend recipe ingests plot objects natively. Should we make it more generic, so that you can create a legend from something like a LegendEntry struct?

A potential structure for that is here:

struct LegendEntry 
    plottype::Type{<: AbstractPlot} # lines & scatter implemented now - each plottype can overload this.
    label # label text
    padding # some tuple / struct
    attributes::Attributes # everything else
end

then Legend could plot a list of LegendEntries, and we could let argument conversion decompose plots into such a list.

asinghvi17 avatar Aug 09 '19 07:08 asinghvi17

I'm just following up on this from a comment on slack. I was concerned with making a legend that actually mapped the correct values to their respective labels.


N = 1000
a = rand((1, 3, 6), N) # a discrete variable
x = randn(N) # a continuous variable
y = @. x + a + 0.8*randn() # a continuous variable
sc = Scene()
scatter!(Group(a), x, y, markersize = 0.2)
lgd = legend(sc.plots[2].plots, string.(unique(sort(a))))
vbox(sc, lgd)

legend_plot

If you don't include the sort function then it maps the values incorrectly. This seems pretty inefficient given this value should be known internally when grouping things. Perhaps the previously referenced bit of code

julia> [scene[end].plots[i].color[] for i in 1:3]
3-element Array{RGB{Float64},1} with eltype ColorTypes.RGB{Float64}:
 RGB{Float64}(0.9019607843137255,0.6235294117647059,0.0)                
 RGB{Float64}(0.33725490196078434,0.7058823529411765,0.9137254901960784)
 RGB{Float64}(0.0,0.6196078431372549,0.45098039215686275)  

Should be a dictionary with the keys corresponding to the group labels.

Tokazama avatar Dec 09 '19 10:12 Tokazama

Legend isn't integrated at all with StatsMakie, so that makes sense. However, I think this can be solved in AbstractPlotting by enforcing a label attribute on all series.

asinghvi17 avatar Dec 09 '19 15:12 asinghvi17

Fixed partially by MakieLayout.

asinghvi17 avatar Feb 16 '20 13:02 asinghvi17