Problem description

Currently, AlgebraOfGraphics does not really have a concept of "aesthetics" as in ggplot, the logic is rather based around shared keyword arguments and conventional use of positional arguments. Arguments 1, 2 and 3 are expected to relate to the X, Y or Z axis. This is not true for lots of plots however, for example HLines has only one argument but it relates to the Y axis. BarPlot, RainCloud, Density, Violin, Errorbars, Rangebars and probably others have two different orientations, and what scales the arguments relate to is dependent on an attribute such as direction or orientation.

The only color attribute that's handled is color, but not others like scatter's strokecolor. This is also because the color handling assumed the related existence of attributes like colormap and colorrange which help transform numbers to colors on Makie's side. For better or worse, these often do not exist for other color attributes like strokecolor though. The only way to currently set these to colors is to pass a vector of them manually.

Another problem with the current implementation is that all layers sharing some variable in their mappings are assumed to be connected. So if you have a line plot with mapping(color = :A) but also a scatter plot with mapping(color = :B), then you will always get a merged legend with lines and scatters overlaid, even if the two are plotting disjoint sets of data and you'd rather prefer to have a separate legend for scatters and lines.

Related issues

These issues are either fixed directly by this PR, or this PR introduces a new way of solving the problems described therein:

https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/75 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/97 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/262 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/329 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/365 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/385 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/427 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/434 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/463 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/469 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/473 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/487 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/491 https://github.com/MakieOrg/AlgebraOfGraphics.jl/issues/504

Implemented solution

This PR internally introduces the notion of an Aesthetic, examples are AesX, AesY, AesColor, AesMarker and so on. These are decoupled from any specific keywords or argument positions and abstractly represent the visual effect of some plotting function argument. For example, the only argument of HLines has an effect on the AesY aesthetic.

Each plotting function now has to have a declared aesthetic_mapping. Here's an example for Violin, which flips the mapping of its positional arguments depending on the value of the orientation attribute. (Note that another new function mandatory_attributes is used to declare attributes that are strictly necessary to resolve the aesthetic mapping, so AlgebraOfGraphics requires these to be set statically and not pulled in via the theme, as the theme should not semantically change the plots.)

function aesthetic_mapping(::Type{Violin})
    dictionary([
        1 => :orientation => dictionary([
            :horizontal => AesX,
            :vertical => AesY,
        ]),
        2 => :orientation => dictionary([
            :horizontal => AesY,
            :vertical => AesX,
        ]),
        :color => AesColor,
    ])
end

Internally, the fitting of categorical or continuous scales is now routed through these aesthetics. This means the orientation keyword for Violin now has the expected effect on the x and y axes:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :horizontal) |> draw

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) * mapping(:x, :y; color = :z) * visual(Violin, orientation = :vertical) |> draw

We can further combine the Violin plot with an HLine plot to mark certain positions of interest, however when we add a color mapping to get a legend entry, the categories of Violin and HLine merge:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type) *
    visual(HLines) |> draw

This can now be handled by separating the two color scales. For this purpose, the scale function can be used to define an identifier, which can then be associated with a mapped variable by extending the => mechanism with a fourth possible option. Note how the legend splits now that the HLines color is mapped to the :second_color scale identifier:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |> draw

While the legend is now adequately split, both color scales use the same default colormap. The old system which relied on passing palettes to the palette keyword, keyed by plotting function arguments, cannot handle this problem. Therefore, a new option to draw called scales is introduced, which allows to pass certain options keyed by the default or custom identifiers for each scale (default identifiers are X, Y, Color, and others, capitalized to show that they are not directly mirroring the keywords like color but rather relate to abstract aesthetics).

Here we pass a one-element palette containing only the color red for our new second_color scale:

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (; second_color = (; palette = ["red"])))

Note that this mechanism also allows to change other attributes like scale labels. We can make use of that to define a label for the y axis, which is unlabelled because Violin and HLines plot different columns there (in principle we could have overridden the axis attribute ylabel here, but this new mechanism works the same across all scales, so it is preferable) .

data((; x = 1:4, y = ["A", "B", "C", "D"], z = ["U", "V", "W", "X"])) *
    mapping(:x, :y; color = :z) *
    visual(Violin, orientation = :vertical) +
    data((; y = 1:4, type = fill("threshold", 4))) *
    mapping(:y, color = :type => scale(:second_color)) *
    visual(HLines) |>
    x -> draw(x; scales = (;
        second_color = (; palette = ["red"]),
        Y = (; label = "A custom Y label"),
    ))

The new implementation removes some hacks around the handling of unusual plot types like Heatmap, which uses its third positional argument for color. Aside from an aesthetic mapping which maps argument 3 to the AesColor aesthetic, this also required to rewrite the pipeline to avoid early rescaling of input data. While AesColor columns will by default be converted to a Vector{RGBAf}, Heatmap can currently not handle this input so the conversion has to be handled instead by passing colormap and colorrange keywords. Each plot type can define custom to_entry methods in order to compute the plot specification given the raw input data and fitted scales. By default, entries will be passed the aesthetic-converted columns which now makes it possible to use strokecolor in a mapping for Scatter, for example:

data((; x = 1:4, y = 5:8, z = ["A", "B", "C", "D"])) *
    mapping(:x, :y, strokecolor = :z) *
    visual(Scatter, strokewidth = 5, markersize = 30, color = :transparent) |>
    x -> draw(x; scales = (; Color = (; palette = :tab20)))

Another benefit of being able to address scales directly, is the ability to override category values and labels. Currently, one can only use sorter and renamer in mapping to bring categorical values into a certain order and change their labels. However, this is more difficult if multiple mappings are merged where the merged categories cannot be sorted together, or for the case where not all categories that are supposed to be shown are present in the data.

Now, there's a category property with which one can override domain, ordering and labels in one go, while also accessing more flexible label types like LaTeXStrings or rich text:

data((; sex = repeat(["m", "f"], 10), weight = rand(20))) *
    mapping(:sex, :weight, color = :sex) *
    visual(Scatter) |> x -> draw(x; scales = (;
        X = (;
            categories = ["m" => "male", "d" => L"\sum{diverse}", "f" => rich("female", color = :green)],
        ),
        Color = (;
            categories = [ "f" => rich("female", color = :green), "m" => "male", "d" => L"\sum{diverse}"],
            palette = ["red", "green", "blue"]
        ),
    ))

For the x and y axes, the "palette" can be overridden, too, in order to introduce visual groupings:

data((; category = rand(["A1", "A2", "B1", "B2", "B3", missing], 1000))) *
    mapping(:category) *
    frequency() |>
    x -> draw(x; scales = (; X = (; palette = [1, 2, 3, 5, 6, 8])))

Discussion points

~It's maybe a bit confusing that scales = (; color = (; ... does not mean mapping(color = ...) but it means AesColor (there's a lookup happening internally from symbol to Aesthetic. The problem is that I wanted to keep the generic dict-like configuration structure, so symbols as keys. Maybe it could be scales = (; Color = ... to signify that it's something different.~
What about multiple signatures for plotting functions, like errorbars having either symmetrical or asymmetrical bars?
What about plotting continuous data on top of categorical? Should it be allowed in a "you're responsible" kind of way? It seems useful enough in some scenarios, for example plotting annotation text between categories.

TODOs

[x] decide what to do with continuous data plotted onto otherwise categorical scales (currently works but should it?)
[x] tighten interface around options passable to scales, currently invalid keywords will be ignored there
[x] think about binned scales and related problems, for example contourf doesn't fit into the current scheme
[x] fix old docs
[x] write new docs
[x] fix old tests
[x] add tests for new functionality

Jun 06 '24 11:06 jkrumbiegel

@piever and @greimel it would be great if you could spare some time to comment on the changes proposed here, as it's only the second time I've interacted with this code base, and then on quite a substantial PR that moves many parts around.

The biggest benefit of the new system is that many plots can only be used "correctly" after it, like HLines, Errorbars, or horizontal Barplot, Violin, RainClouds. Also, splitting of related scales gives much more freedom to "layer" things without them interfering with each other in undesired ways. A drawback is that the new explicitness means AoG doesn't automatically work with custom recipes, but at least it throws informative errors when parts are missing. My personal opinion is that it's better if the basics work really well and some work has to be done for custom scenarios.

I'm pretty sure this cannot be the final iteration of the aesthetic_mapping mechanism as some plots are even more complicated to deal with. For example surface which kind of combines color and z in its third argument, unless you specify color (but you'd have to do so as a matrix, which we can't). But I'm inclined to move forward with this before it feels final, as nobody knows when that would be and until then, it would be nice to fix all those common issues already.

I haven't yet written new docs or tests as I wanted to gather some opinions first.

Jun 13 '24 06:06 jkrumbiegel

I don't have resources currently to thoroughly look at the code, though. Sorry!

I did however I play around with this a little bit and I noticed that dodged barplots don't work anymore (see the docs preview) Otherwise this looks great!

I'll play around a bit more next week. If all features are still supported (with adapted code), I'd say go for it!

Jun 18 '24 09:06 greimel

I did however I play around with this a little bit and I noticed that dodged barplots don't work anymore (see the docs preview) Otherwise this looks great!

@greimel thanks, yes I noticed the barplots, too. I probably broke the global width adjustment so that each dodge group gets width calculated separately.

If all features are still supported (with adapted code), I'd say go for it!

Yeah I'll try to get as much working as I can. It's likely though that I'd need a second iteration for complex problems like binned scales/contourf etc. Although it's a tradeoff because I consider the previous support for that more on the "happened to kind of work" level :) So I think it's somewhat fair if a more principled approach removes some of the possibilities you had before because the interface allowed for a lot of things to be passed through without proper handling.

Jun 18 '24 09:06 jkrumbiegel

Let me rephrase: It would be nice if at least all documented features keep working ;-)

Jun 18 '24 09:06 greimel

Here's one more thing I noticed: https://aog.makie.org/previews/PR505/gallery/gallery/scales/legend_merging/#Legend-merging

These legends are not merged automatically, even though color and marker use the same scale. Is this intentional?

I'd rather merge them by default, and allow splitting, if desired.

Jun 19 '24 08:06 greimel

Here's one more thing I noticed: https://aog.makie.org/previews/PR505/gallery/gallery/scales/legend_merging/#Legend-merging

These legends are not merged automatically, even though color and marker use the same scale. Is this intentional?

I'd rather merge them by default, and allow splitting, if desired.

Good catch, I forgot about that. This was a side effect of refactoring the way that legends are constructed, I need to add the ability to merge scales back in.

Jun 19 '24 08:06 jkrumbiegel

my two cents, maybe missing functionality ? or, I need to read the docs more 🫨

Output here: https://beautiful.makie.org/dev/examples/aog/ablines

p_1to1 =  mapping([0],[1]) * visual(ABLines) # declare data-arguments and visual layer
# declare the dataset
p_not1to1  = data((; intercepts = [1,2,3], slopes=[1,1.5,2]))
# declare the arguments of the analysis
p_not1to1 *= mapping(:intercepts, :slopes, color=:intercepts => nonnumeric)
# define your visual layer, what kind of plot do you want?
p_not1to1 *= visual(ABLines, color = [:red, :blue, :orange], linestyle=:dash)

with_theme(theme_ggplot2(), size = (600,400)) do
    p_1to1 + p_not1to1 |> draw
end

Output here: https://beautiful.makie.org/dev/examples/aog/MarketData#stockchart

using MarketData, DataFrames
using AlgebraOfGraphics, GLMakie
using Statistics

df = DataFrame(ohlc)
pltd = data(df[200:280,:])
plt = pltd * mapping(:timestamp, :Open => "StockChart")
plt *= mapping(fillto=:Close, color = (:Open, :Close) => isless => "Open<Close")
plt *= visual(BarPlot)

with_theme(theme_dark(), size = (800,500)) do
    draw(plt, scales =(; Color =(; palette = [:deepskyblue, :firebrick3])))
end

and this one:

using AlgebraOfGraphics, GLMakie
using Random, DataFrames

Random.seed!(134)
## from this [post](https://discourse.julialang.org/t/how-to-make-this-plot-in-julia/75065/22).
d = DataFrame(name = repeat(["A","B","C","D","E","F"], inner=4), 
      time=repeat([0,1,3,6], outer=6), value = rand(24));

pSL = data(d)
pSL *= mapping(:time, :value, color = :name, text = :name => verbatim # now is not working :(
    )
pSL *= visual(ScatterLines) + visual(Makie.Text, align = (:center, :bottom))
with_theme(theme_ggplot2(), size = (600,400)) do
    draw(pSL)
end

Jun 26 '24 17:06 lazarusA

@lazarusA thanks, you're right, hadn't done those two, yet. Both added now

Jun 26 '24 18:06 jkrumbiegel

there is a third one 😄, visual(Makie.Text, align = (:center, :bottom)) . Its supposed to reproduce this: https://discourse.julialang.org/t/how-to-make-this-plot-in-julia/75065/21

Jun 26 '24 19:06 lazarusA

@lazarusA not sure I understand, what is wrong?

Jun 26 '24 19:06 jkrumbiegel

The text along the lines is not showing :D

This is how is suppose to look like (but it is not, at the moment)

Jun 26 '24 19:06 lazarusA

@lazarusA ah I didn't post that one, it has to be rewritten slightly so the keywords are sent only to the right plotting functions:

Random.seed!(134)
## from this [post](https://discourse.julialang.org/t/how-to-make-this-plot-in-julia/75065/22).
d = DataFrame(name = repeat(["A","B","C","D","E","F"], inner=4), 
      time=repeat([0,1,3,6], outer=6), value = rand(24));

pSL = data(d)
pSL *= mapping(:time, :value)
pSL *= mapping(color = :name) * visual(ScatterLines) +
    mapping(color = :name, text = :name => verbatim) * visual(Makie.Text, align = (:center, :bottom))
with_theme(theme_ggplot2(), size = (600,400)) do
    draw(pSL)
end

Jun 26 '24 20:06 jkrumbiegel

@piever and @greimel it would be great if you could spare some time to comment on the changes proposed here, as it's only the second time I've interacted with this code base, and then on quite a substantial PR that moves many parts around.

Hey @jkrumbiegel, really nice work! I confess I don't unfortunately have the time availability for a full review but here are my two cents.

I definitely prefer this approach compared to the previous one (trying to get everything to work but unreliably). I now understand what you meant in a few discussions we had in the past, and it makes a lot of sense. In particular, I think that requiring some extra work for custom recipes is an acceptable trade-off. Esp. since this feels like something that could be easy to add on a recipe by recipe case. I imagine in the future (when things stabilize) the aesthetics_mapping stub can live in Makie and one can add in the docs that for AoG compatibility one should overload aesthetics_mapping function for their plot type.

Re: breakage, I also think that if the documentation runs without errors it's definitely a good sign, as it is pretty thorough (I imagine this will be a breaking release anyways).

decide what to do with continuous data plotted onto otherwise categorical scales (currently works but should it?)

I would probably prefer to keep allowing it personally (it's easy to think of use cases for that). Maybe the feature can be kept and documented prominently?

One final comment that I have is about another pain point in AoG that maybe could be solved here (potentially in another iteration, the PR is already quite big). The whole dodge / stack implementation mostly tries to forward things to Makie, which doesn't quite work for all usecases (for example, a dodged barplot with error bars is tricky to do). Now that recipes "declare their aesthetics", I guess one could have a general dodge implementation here in AoG that would work with most recipes.

Jun 27 '24 11:06 piever

@piever thank you for your feedback :)

I imagine in the future (when things stabilize) the aesthetics_mapping stub can live in Makie and one can add in the docs that for AoG compatibility one should overload aesthetics_mapping function for their plot type.

Yes something like this was my thinking, too. The aesthetic mappings might duplicate some information from the Makie pipeline a bit (I've now had to add some dispatch routes there for different numbers of positional arguments), although Makie doesn't care so much about the "semantics" of a plot, more about feeding types into the right conversions, whatever the visual end result may be.

(I imagine this will be a breaking release anyways).

Yes it will be, I've tried to make the likely errors descriptive so that users that update unsuspectingly have some guidance.

I would probably prefer to keep allowing [continuous data on categorical axes] personally (it's easy to think of use cases for that). Maybe the feature can be kept and documented prominently?

I agree it's useful. Maybe it would be nice to have an opt-in mechanism, so that doing it by accident is caught but if you really want, you can. For example something like mapping(:x, :y => pseudocategorical) and then it would be allowed to just proceed through the pipeline.

I guess one could have a general dodge implementation here in AoG that would work with most recipes

Hm yeah that could be done I think, in the entry-conversion functions which are now dispatchable per plottype, I could modify the x/y data of the errorbars according to the possibly present dodge attribute. I'll keep that in mind!

Jun 27 '24 12:06 jkrumbiegel

AlgebraOfGraphics.jl
AlgebraOfGraphics.jl copied to clipboard

Explicit aesthetics / scales

Problem description

Related issues

Implemented solution

Discussion points

TODOs

AlgebraOfGraphics.jl AlgebraOfGraphics.jl copied to clipboard

Explicit aesthetics / scales

Problem description

Related issues

Implemented solution

Discussion points

TODOs

AlgebraOfGraphics.jl
AlgebraOfGraphics.jl copied to clipboard