seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Is there a plan to add internal axvline/axhline support to seaborn.objects soon?

Open NickCH-K opened this issue 1 year ago • 8 comments
trafficstars

I am currently in the process of updating my book for its second edition, and am changing the Python plotting code I suggest to readers to use seaborn.objects where possible.

However, the plotting code I use in the book contains quite a few plt.axvline() and plt.axhline() lines! I am now changing these to the fairly laborious fig = plt.figure(); (so.Plot().on(fig).etc.); fig.axes[0].axhline().

I know there have been several issues previously about adding axhline/axvline support, and I know you've expressed interest in supporting it and that there are barriers to doing so. My main question is whether there's a plan to implement this on the horizon soon, i.e. I should be planning to revise my book text to reflect the new implementation before I publish the thing. If there's not, that's fine, I'm still planning to make the seaborn.objects switch. But this thing will be in print for quite a while and I'd rather it not get out of date that fast.

Thank you!

NickCH-K avatar May 10 '24 21:05 NickCH-K

I don't think there is anything planned in a near future for this. I have my own code to do this if you like, as thankfully the objects API is modular :

from dataclasses import dataclass
import matplotlib as mpl
from seaborn._marks.base import (
    Mappable,
    MappableColor,
    MappableFloat,
    MappableString,
    Mark,
    resolve_color,
    resolve_properties,
)
from seaborn._stats.base import Stat
from seaborn._core.typing import Default

@dataclass
class StraightLine(Mark):
    """Object drawing an horizontal or vertical line using the axline.
    Giving orient "x" will result in a vertical line.
    """

    color: MappableColor = Mappable("C0")
    alpha: MappableFloat = Mappable(1)
    linewidth: MappableFloat = Mappable(rc="lines.linewidth")
    linestyle: MappableString = Mappable(rc="lines.linestyle")

    def _plot(self, split_gen, scales, orient):

        for keys, data, ax in split_gen():

            vals = resolve_properties(self, keys, scales)
            vals["color"] = resolve_color(self, keys, scales=scales)

            artist_kws = self.artist_kws.copy()
            value = {"x": "y", "y": "x"}[orient]
            xy1_dict = {value: float(data[value].to_numpy()), orient: 0}
            xy2_dict = {value: float(data[value].to_numpy()), orient: 1}
            ax.axline(
                (xy1_dict["x"], xy1_dict["y"]),
                (xy2_dict["x"], xy2_dict["y"]),
                color=vals["color"],
                linewidth=vals["linewidth"],
                linestyle=vals["linestyle"],
                **artist_kws,
            )

    def _legend_artist(self, variables, value, scales):

        keys = {v: value for v in variables}
        vals = resolve_properties(self, keys, scales)
        vals["color"] = resolve_color(self, keys, scales=scales)

        artist_kws = self.artist_kws.copy()

        return mpl.lines.Line2D(
            [],
            [],
            color=vals["color"],
            linewidth=vals["linewidth"],
            linestyle=vals["linestyle"],
            **artist_kws,
        )

I only use this when I want to plot an aggregate value though (e.g. mean or median), otherwise I use the axhline matplotlib API directly which is more convenient (I am not sure why you feel its is more laborious). Also, this uses private attributes which may break in a future update.

thuiop avatar May 13 '24 11:05 thuiop

Thank you, good to know!

As for laboriousness, it's mostly just on a pedagogical level rather than a typing-lines level. If there were an so.axhline method, I could do it in the same way I do the rest of the graph. Since there's not, I need to introduce the whole fig thing and .on(fig) as concepts, which are different ways of approaching the graph-making.

NickCH-K avatar May 13 '24 11:05 NickCH-K

If there were an so.axhline method, I could do it in the same way I do the rest of the graph.

The question is : what exactly is it supposed to do ? The seaborn objects operate on the DataFrame; there is no reason to have one that plots an arbitrary value, and it is pretty weird semantically. This is why I only use my custom object when plotting aggregate value, which makes sense as seaborn can compute those using Stat objects (avoiding the need for extra computations outside). Also, I personally use .on(fig) or .on(ax) all the time; it would make sense to me to introduce it as you basically need it if you want to interact with your figure in any way afterwards.

thuiop avatar May 13 '24 11:05 thuiop

That's interesting - given the so.Plot().add() structure, I have thought of the conceptual framing of .add() being that we are operating on the plot itself, rather than the data frame. In that context it would make intuitive sense that you could put things into .add() that are not reliant on the data itself.

NickCH-K avatar May 13 '24 12:05 NickCH-K

The tricky thing as I recall it is that sometimes you would want a horizontal or vertical rules to be dependent on the data, eg you might want to show a distribution and then draw a vertical line at its mean, and then do that when grouping by a color variable, etc.

I do also think that using matplotlib objects directly is an anti-pattern (currently necessary here, but the sign of a feature gap).

mwaskom avatar May 13 '24 13:05 mwaskom

Yes, definitely, I think reference lines based on statistics like means (and potentially within groupings) and based on constants that are given meaning by the data but not necessarily calculated based on the data are both very common. In my experience using seaborn.objects it's by far the most common thing I go back to matplotlib to add.

This is just an offhand thought but I wonder if a syntax-compatible way of doing this would be to treat the use of constants like this as a kind of aggregation/Stat. Like if I could do so.Plot().add(so.Line(), so.Agg('constant', 0)). After all, I picked 0 based on its meaning relative to the data, even if the formula I used to do that in my head isn't something it would make sense to write a Python function to calculate - it's easier just to give it the answer.

NickCH-K avatar May 13 '24 13:05 NickCH-K

To be clear I don't think there are any conceptual problems with so.Rule (which is the terminology I was planning on using) just some annoying things about supporting a mark that typically has a scalar parameterization (e.g. so.Rule(y=0)) but sometimes will have one or more values derived from the data.

mwaskom avatar May 14 '24 01:05 mwaskom

Also, interaction with things like facet may be very weird.

thuiop avatar May 14 '24 07:05 thuiop

I guess the answer here was no :)

mwaskom avatar Jan 26 '25 15:01 mwaskom