AlgebraOfGraphics.jl icon indicating copy to clipboard operation
AlgebraOfGraphics.jl copied to clipboard

Name clash with `data`

Open KronosTheLate opened this issue 3 years ago • 9 comments
trafficstars

AlgebraOfGraphics.jl exports data. But it is one of the most common variable names, leading to frequent name clashes. I do not see what the best solution is, but as I find this troublesome, I wanted to raise the issue.

KronosTheLate avatar Feb 10 '22 12:02 KronosTheLate

I was convinced there was already an issue for this but could not find it. It could make sense to also rename data as part of the breaking changes discussed at #345. Maybe with(df) could make sense? It is analogous to DataFramesMeta.@with, and with is probably less likely to be a variable.

piever avatar Feb 10 '22 17:02 piever

Agreed. I actually think that with is a pretty great alternative.

I can also think of the very explicit, but a little long, 'with_data' or 'withdata'. I am not sure it is better than 'with', but wanted to throw it out there.

KronosTheLate avatar Feb 10 '22 18:02 KronosTheLate

Yes this was reported before in https://github.com/JuliaPlots/AlgebraOfGraphics.jl/issues/207

I think table(...) would fit well, but it's more likely than with to be a variable. On the other hand I would not be surprised if a with function is defined by packages to be used with the do block syntax.

knuesel avatar Feb 10 '22 18:02 knuesel

Another option: since data is rather redundant (df is of course the data) it could be left out entirely:

plt = df * mapping(:a, :b) * visual(Lines)

So AoG would define Base.:*(df, l::Layer) = data(df) * l, etc. and data would be unexported.

knuesel avatar Feb 11 '22 08:02 knuesel

I confess I would be a little hesitant to overload that method, but I was wondering whether another option could be to make Layer callable, so that one would do eg

df |> mapping(:a, :b) * lines |> draw

piever avatar Feb 11 '22 09:02 piever

I confess I would be a little hesitant to overload that method, but I was wondering whether another option could be to make Layer callable, so that one would do eg

df |> mapping(:a, :b) * lines |> draw

This is powerful, expressive and concise! I also prefer this API. Would this work in every DataFrame or just the ones named exclusively df?

storopoli avatar Feb 11 '22 10:02 storopoli

I still have to read through https://github.com/JuliaPlots/AlgebraOfGraphics.jl/discussions/345 but to me, it's confusing that you can draw a plot without data (pregrouped) by supplying the data directly to the mapping.

Therefore, I'd think either that functionality should move out of mapping for example by doing data(pregrouped_vectors) * mapping(1, 2) for specifying the vectors via indices. Or the data functionality could move inside of mapping because a mapping only really works by referencing a data source right? So to me it feels natural to pull the two together.

mapping(:x, :y, data = df) * visual(...)

jkrumbiegel avatar Feb 11 '22 10:02 jkrumbiegel

@piever for the overloading of * do you mean because it feels weird? (From a type piracy point of view it's legal as long as AoG owns one of the * argument types...)

Anyway I love the pipe version! I was first hesitant because currently the datasets are regular players in the algebra, for example we can do

(data(df1)*mapping(:a, :b) + data(df2)*mapping(:x, :y)) * visual(markersize=10)

but actually it also works with pipes:

((df1 |> mapping(:a, :b)) + (df2 |> mapping(:x, :y))) * visual(markersize=10)

(This takes a few more parentheses but for such a rare use case it's no big deal.)

I think the semantics of piping are a great fit here, and callable layers make sense even without pipes:

line_plot = mapping(:a, :b) * visual(Lines, linewidth=3)
plt = line_plot(df)  # Can be called on different datasets

If the pipe solution is adopted, it might be nice to add a "curried" version of draw to allow things like

df |> mapping(:a, :b) * lines |> draw(axis=(300, 200))

@storopoli the variable name doesn't matter (it's only for macros that the variable name can make a difference. Edit: and some odd cases like f(; a) and (; a) = x).

knuesel avatar Feb 11 '22 14:02 knuesel

for the overloading of * do you mean because it feels weird? (From a type piracy point of view it's legal as long as AoG owns one of the * argument types...)

It is legal, but I suspect it is at least discouraged, in that it is a bit "punny" and bound to create method errors.

I was first hesitant because currently the datasets are regular players in the algebra

Yes, that's the reason why it was not done via piping. But, as you point out, it's still nice to work with different datasets, eg

(mapping(:a, :b)(df1) + mapping(:x, :y)(df2)) * visual(markersize=10)

And in general, I think layer1(df1) + layer2(df2) reads reasonably well.

I wonder whether it could give rise to confusion if users wrote mapping(:a) * mapping(:b)(df) and thought df would only apply to mapping(:b). Maybe it's enough to document this and error if users try passing several datasets in the same layer, eg mapping(:a)(df1) * mapping(:b)(df2). These are all edgecases anyway, as I imagine the most common would be df |> layers |> draw.

I still have to read through https://github.com/JuliaPlots/AlgebraOfGraphics.jl/discussions/345 but to me, it's confusing that you can draw a plot without data (pregrouped) by supplying the data directly to the mapping.

Doing pregrouped data from a dataframe will definitely be supported (whereas it isn't right now). There is some discussion in #345 (we should probably continue this particular discussion there) as to whether plotting from vectors, eg mapping(rand(10), rand(10)) |> draw, should we supported with mapping or with another name, but I think we should allow it for users who do not generally work with tables.

piever avatar Feb 11 '22 14:02 piever