altair icon indicating copy to clipboard operation
altair copied to clipboard

WIP: add method-based attribute setting to encoding classes

Open jakevdp opened this issue 6 years ago • 27 comments

Do not merge until Altair 4.0.

This is a new prototype of how to set encoding attributes via methods rather than arguments. So rather than

alt.Chart(data).mark_point().encode(
    alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'),
    alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis'))
)

you could instead do

alt.Chart(data).mark_point().encode(
    alt.X('column:Q').axis(None).scale(zero=False).title('my title'),
    alt.Color('color:N').legend(orient='bottom').scale(scheme='viridis')
)

note that the old way still works as well.

The tradeoff is that attributes of encoding classes can no longer be accessed directly via getattr, but this was not a very common pattern anyway.

Still TODO:

  • [ ] add useful documentation and keyword argument completion for these functions.
  • [ ] tests ensuring equivalence of old and new syntax.

If you'd like to try this in its current state, you can run

pip install git+https://github.com/jakevdp/altair.git@attr-access

jakevdp avatar Jul 26 '19 04:07 jakevdp

I think this is awesome! I wanted to raise a few questions/concerns to make sure I fully understand the proposal.

Do we support the method-style API for nested properties or just encodings?

One thing I am worried about is how people access nested properties. For example, if I want to customize the scheme.

This is how I set the Scale's scheme property.

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q').scale(scheme='viridis')
)

If I now want to customize the scheme, I need to switch from the method style to the argument style. I hope it's not going to be too surprising for people.

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q').scale(scheme=alt.Scheme(name='viridis', extent=[0, 0.9]))
)

I assume that Altair supports this method syntax for everything, not just encodings, right?

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q').scale(alt.Scheme().name('viridis').extent([0, 0.9]))
)

Otherwise, I worry about making a special syntax for encodings.

Formatting long specs

The other thing I think about is formatting. Is this valid Python?

alt.Chart(data).mark_point().encode(
    alt.X('column:Q')
		.axis(None)
		.scale(zero=False)
		.title('my title'),
    alt.Color('color:N')
		.legend(orient='bottom')
		.scale(scheme='viridis')
)

In the current syntax, one can write this.

alt.Chart(data).mark_point().encode(
    alt.X(
		'column:Q',
		axis=alt.Axis(None),
		scale=alt.Scale(zero=False),
		title='my title'),
    alt.Color(
		'color:N',
		legend=alt.Legend(orient='bottom'),
		scale=alt.Scale(scheme='viridis')
	)
)

Help people use the new way

Another concern is that Altair now has two ways to express the same thing. I definitely think that we want to be backward compatible but I wonder how we can help people transition to the new API without too much confusion. How difficult would it be to write a converter? Kind of like 2to3 for Python 3?

domoritz avatar Jul 26 '19 08:07 domoritz

Thanks @domoritz – a few thoughts:

Do we support the method-style API for nested properties or just encodings?

Up until now we've supported limited method-style APIs for top-level objects (e.g. alt.Chart().mark_point().encode().configure_scale()...etc.) The proposal here is to extend that kind of syntax only to encoding channel objects. It's not clear to me that extending that to further nested objects would be useful. In your example of the scheme, how would you propose a syntax for nesting method-based setting? scale() returns the parent encoding object to allow chaining. So maybe you could do

alt.Color('column', 
    scale=alt.Scale().scheme('viridis', extent=[0, 0.9])
)

but it's not clear to me that's better than

alt.Color('column').scale(
    scheme=alt.Scheme('viridis', extent=[0, 0.9])
)

particularly since nested settings like this are so much less common than their unnested counterparts. A question: how does the JS API deal with nested settings like this?

Formatting long specs

Roughly, in Python whitespace doesn't matter if you're inside a parentheses block. So for example this is not valid Python

alt.Chart()
    .mark_point()
    .encode(x='foo')

but it's really nice, which is why sometimes you come across code written like this, which is valid Python:

(alt.Chart()
    .mark_point()
    .encode(x='foo'))

Because these encoding properties generally appear within parentheses, they can take advantage of this, and this is indeed valid Python:

alt.Chart(data).mark_point().encode(
    alt.X('column:Q')
		.axis(None)
		.scale(zero=False)
		.title('my title'),
    alt.Color('color:N')
		.legend(orient='bottom')
		.scale(scheme='viridis')
)

Helping people use the new way

The old way would of course still be valid, so most code will not break (the exception is, e.g., if someone tries to access chart.encoding.x.scale and expects it to be an alt.Scale object). But I'd change all examples and demos to use the new syntax, in order to encourage its adoption.

jakevdp avatar Jul 26 '19 15:07 jakevdp

Would it make sense to consider using “magic underscore” style arguments for nested properties beyond the method’s depth?

The Plotly.py library recently implemented that syntax based on successful usage in the Julia version of Plotly. Read more about the Plotly.py implementation in this issue.

That would turn:

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q').scale(scheme=alt.Scheme(name='viridis', extent=[0, 0.9]))
)

into:

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q').scale(scheme_name='viridis', scheme_extent=[0, 0.9])
)

jcmkk3 avatar Jul 26 '19 15:07 jcmkk3

Maybe:

alt.Chart(data).mark_point().encode(
    alt.Color('color:Q', alt.Scale(alt.Scheme(name='viridis', extent=[0, 0.9]))
)

iliatimofeev avatar Jul 29 '19 13:07 iliatimofeev

@iliatimofeev that sort of specification was available in Altair v1, but we abandoned it in later versions.

jakevdp avatar Jul 29 '19 13:07 jakevdp

@jakevdp for encoding channels it still works in your prototype :). As I understand the goal is to avoid doubling semantics like scale = alt.Scale()
Chaining methods look very cool on the second layer but breaks on the third. But scale = alt.Scale( scheme=alt.Scheme()) is quite regular case.

The idea of “magic underscore” is cool too but it will break compatibility with vega-lite documentation.

I guess type based notation is more universal alt.Color(alt.Scale( alt.Scheme('blues'),alt.Domain(-1,1), clip = False)

Do you remember the reasons why it is not supported in v2?

iliatimofeev avatar Jul 29 '19 15:07 iliatimofeev

It's always worked for arguments of chart.encode(), but only worked for encoding channel arguments in Altair v1.

The main reason we dropped it is because it became difficult to implement in a consistent way when we moved from a traitlet-based prescriptive type system to a jsonschema-based proscriptive type system.

jakevdp avatar Jul 29 '19 15:07 jakevdp

I know two cases where "types" style wouldn't work: tooltips channel and conditions. For tooltips, we can add an extra virtual channel: atl.Tooltips("data1:N","data2:T", alt.Tooltip("data_t",timeUnit="day")) And for conditions add extra parameter else to channels, like: alt.Color("data1:N", alt.Else(selection,"red"))

iliatimofeev avatar Jul 29 '19 15:07 iliatimofeev

tl;dr - I like method chaining, but let's not break the attribute API in the process.

The original idea was for the chained method API to provide a thin optional layer on top of the attribute based approach, in particular for "single expression" interactive usage. However, for deeper API usage where you are closing, reusing, mutating, etc. Chart objects across a library, the attribute based approach remains powerful and expressive.

Because of this, the main concern I have here is breaking the attribute based API. I think that will cause a very high level of pain for users and major API breakage. The reason the existing encode method doesn't interfere with the attribute based API is that the noun (encoding) is the attribute and the very (encode) is the method. While libraries regularly break this separate of noun-based attributes and verb-based attributes, it was one that we tried to follow closely in the Altair APIs. The reason this proposed API breaks the attribute based access is precisely because it overrides the noun-based attributes to be methods. Because of that I am not in favor of this API change as is, but would welcome more method chaining that preserves the attribute based access.

ellisonbg avatar Aug 26 '19 04:08 ellisonbg

Thanks for the input, @ellisonbg. Philosophically, I like the "verb/noun" construct we started with, but note that it breaks down almost immediately when applied to the Vega-Lite schema (transform/transform and bin/bin come to mind). And what is the verb form of a setting like scale? I suppose set_scale() makes sense, but I find explicit getters & setters like this to be a bit cluttery.

What this PR comes down to is this: I think I'm willing to forego philosophical purity of a rarely-used API (nested attribute access in charts) in favor of simplicity for an often-used API (chart construction via uncluttered method chaining).

I guess my concrete question for you would be: given the constraints you're concerned about, what API would you propose for this chart? Or are you OK keeping the status quo?

alt.Chart(data).mark_point().encode(
    alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'),
    alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis'))
)

jakevdp avatar Aug 26 '19 14:08 jakevdp

Will think on this a bit - it is a good question as it is definitely verbose. A few other questions to explore as well:

  • If we go with the method chaining, what is the API for mutating an existing spec. Let's say I want to take the spec above, and change the x title.
  • If we go with the method chaining, how do I ask a spec the value of an attribute?

I don't view my objection as one of philosophical purity - only that the purity of encode/encoding let us avoid the conflict between the attribute and method APIs. I too am fine with pushing the boundary, bluring the edges, etc. for practicality's sake (I am aligned with you and the zen of python on this front).

On Mon, Aug 26, 2019 at 7:57 AM Jake Vanderplas [email protected] wrote:

Thanks for the input, @ellisonbg https://github.com/ellisonbg. Philosophically, I like the "verb/noun" construct we started with, but note that it breaks down almost immediately when applied to the Vega-Lite schema (transform/transform and bin/bin come to mind). And what is the verb form of a setting like scale? I suppose set_scale() makes sense, but I find explicit getters & setters like this to be a bit cluttery.

What this PR comes down to is this: I think I'm willing to forego philosophical purity of a rarely-used API (nested attribute access in charts) in favor of simplicity for an often-used API (chart construction via uncluttered method chaining).

I guess my concrete question for you would be: given the constraints you're concerned about, what API would you propose for this chart? Or are you OK keeping the status quo?

alt.Chart(data).mark_point().encode( alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'), alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis')) )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/pull/1629?email_source=notifications&email_token=AAAGXUEHO3WNFH7AA4NLPWDQGPVNRA5CNFSM4IHAKFG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5EUBSI#issuecomment-524894409, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGXUDY5CZXORQNBCUKTTTQGPVNRANCNFSM4IHAKFGQ .

-- Brian E. Granger

Principal Technical Program Manager, AWS AI Platform ([email protected]) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

ellisonbg avatar Aug 26 '19 16:08 ellisonbg

If we go with the method chaining, what is the API for mutating an existing spec. Let's say I want to take the spec above, and change the x title.

Because getattr and setattr are different code paths, chart.encoding.x.title = new_title will do what you expect.

If we go with the method chaining, how do I ask a spec the value of an attribute?

That's what changes here... title = chart.encoding.x.title is now a function rather than the title value. You'd have to do title = chart.encoding.x['title'] to get the value out. This is the core of the change here and the core of the tradeoff – I don't think I've ever seen anyone use code that depends on accessing deeply-nested attributes like this, but I very often see code constructing charts with large lists of prop = alt.Prop(...) arguments.

jakevdp avatar Aug 26 '19 17:08 jakevdp

One option, which is admittedly a bit weird, would be to initialize each value with a setter function, and have that function replace itself with the value once it's called.

Then this would work:

x = alt.X('foo').title('foo')
x.title == 'foo'  # True

jakevdp avatar Aug 26 '19 17:08 jakevdp

But this wouldn't, right?

x = alt.X('foo').title('foo')
xx = x.title('bar')

domoritz avatar Aug 26 '19 17:08 domoritz

Right, that wouldn't work.

jakevdp avatar Aug 26 '19 17:08 jakevdp

I think that broadly, the problem is that Altair was originally built on top of traitlets, and we've been slowly moving away from a traitlets-like model because it's fundamentally incompatible with a proscriptive schema-based object validation.

Now we're trying to evolve the chart construction API, and we're running into conflicts with vestiges of the original traitlets-based approach that still exist, but for no good reason other than a modicum of backward compatibility between Altair v1 and v2. If it comes down to a choice between the two, I'd personally lean toward jettisoning the traitlets-inspired model altogether in favor of a better plotting API.

jakevdp avatar Aug 26 '19 17:08 jakevdp

Our choice of traitlets and the attribute based model are correlated, but the direction of causation was the opposite. Namely, we wanted the API to be pythonic (don't hide things behind getters/setters, and dict based access is overly verbose) so we choose Traitlets as a way to provide a validation layer for an attribute based API. Even though we are not using traitlets, I think the underlying logic still holds.

It is a good point that we do have dict ([key]) based access as a separate API available so we can cleanly separate out methods from values, but I am still not convinced it is worth breaking the entire API for this. To be clear though, my argument is not a backwards compatibility one (I am generally in favor of moving APIs forward to serve users). I am not convinced this change will be a net improvement. I would propose we get creative to figure out a way to improve the method based API, while preserving the attribute based one. What about having a special attribute on the objects that serves as a namespace for the method based API?

On Mon, Aug 26, 2019 at 10:47 AM Jake Vanderplas [email protected] wrote:

I think that broadly, the problem is that Altair was originally built on top of traitlets, and we've been slowly moving away from a traitlets-like model because it's fundamentally incompatible with a proscriptive schema-based object validation.

Now we're trying to evolve the chart construction API, and we're running into conflicts with vestiges of the original traitlets-based approach that still exist, but for no good reason other than a modicum of backward compatibility between Altair v1 and v2. If it comes down to a choice between the two, I'd personally lean toward jettisoning the traitlets-inspired model altogether in favor of a better plotting API.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/pull/1629?email_source=notifications&email_token=AAAGXUDMJMSO47UQBP6DOETQGQJJ7A5CNFSM4IHAKFG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FDQHQ#issuecomment-524957726, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGXUHFUEDN2UPW6LUADEDQGQJJ7ANCNFSM4IHAKFGQ .

-- Brian E. Granger

Principal Technical Program Manager, AWS AI Platform ([email protected]) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

ellisonbg avatar Oct 30 '19 21:10 ellisonbg

Thanks Brian - one possiblity I thought of: we could make any undefined attributes evaluate as a callable setter function. But then I could imagine users getting pretty confused by that if they try to override an existing value...

jakevdp avatar Oct 31 '19 00:10 jakevdp

Another possibility is to have method-based setters only for channel classes, and to replace these setters with the attributes once they're used in the chart. So this would work correctly:

chart = alt.Chart(data).encode(
  x=alt.X('x').bin(maxbins=20)
)
assert chart.x.bin.maxbins == 20

jakevdp avatar Oct 31 '19 01:10 jakevdp

In the last example, would this work?

x = alt.X('x').bin(maxbins=20)

chart1 = alt.Chart(data).encode(
  x=x
)
assert chart1.x.bin.maxbins == 20

xx = x.bin(maxbins=30) # does this work???

chart2 = alt.Chart(data).encode(
  x=xx
)
assert chart2.x.bin.maxbins == 30

domoritz avatar Oct 31 '19 05:10 domoritz

@domoritz - yes, it could be made so that would work.

jakevdp avatar Oct 31 '19 06:10 jakevdp

It seems like we're getting close to a new major vega-lite release again. I'm wondering if it is worth revisiting this change to the API. I for one, would love to see this change. I think that it would be a usability improvement and make it easier to move between Altair and vega-lite-api. I'd be happy to help translating docs, if you decide to go forward with it.

jcmkk3 avatar Jan 16 '21 18:01 jcmkk3

This would be a super cool enhancement that I would love to see as well! (but I understand probably quite time consuming to implement) I would be happy to help out with testing, writing docs, or something of similar difficulty/skill level if needed.

joelostblom avatar Jan 16 '21 20:01 joelostblom

We're still stalled on wrapping Vega-Lite 4.9... I think it's going to be a while before we can get to the newer Vega-Lite features, and a change like this is even more major.

jakevdp avatar Jan 16 '21 20:01 jakevdp

Would the next release of Altair be a natural time to add this? I would be very happy to experiment with this PR if there is interest!

ChristopherDavisUCI avatar Apr 16 '22 21:04 ChristopherDavisUCI

I would love to see this too and have been thinking that a 5.0 release would be an opportune moment to introduce a big addition like this! I started working on this a little bit myself a few weeks ago and I just uploaded my branch to https://github.com/altair-viz/altair/pull/2592, feel free to build on it or just copy the files and start over, whatever is easier.

As far as I understand, the biggest outstanding item from the discussion above is whether the new method-based syntax warrants losing the ability of getting and setting values via the syntax chart.encoding.x.title and instead having to do chart.encoding.x['title']. If not, we would need a workaround, maybe one of the one's Jake suggested above? I also asked about this on SO recently and it is actually possible to have chart.encoding.x.title work for setting/getting and have chart.encoding.x.title() update the value, but that could be confusing so it is probably best to avoid it.

For me personally, the added value of using the syntax proposed here outweighs the inconvenience of having to use the dictionary-like syntax to get/set values, but I think it is up to @jakevdp and @ellisonbg to decide on which direction they prefer for Altair and what would be the most helpful way to contribute here.

joelostblom avatar Apr 16 '22 23:04 joelostblom

I agree this api would be super nice and that breaking the access as you suggested makes sense. I guess there may be some benefit to maintaining backwards compatibility and we could do so with a warning.

domoritz avatar Apr 17 '22 00:04 domoritz

Implemented in #2795

joelostblom avatar Jan 04 '23 11:01 joelostblom