altair
altair copied to clipboard
WIP: add method-based attribute setting to encoding classes
Do not merge until Altair 4.0.
This is a new prototype of how to set encoding attributes via methods rather than arguments. So rather than
alt.Chart(data).mark_point().encode(
alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'),
alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis'))
)
you could instead do
alt.Chart(data).mark_point().encode(
alt.X('column:Q').axis(None).scale(zero=False).title('my title'),
alt.Color('color:N').legend(orient='bottom').scale(scheme='viridis')
)
note that the old way still works as well.
The tradeoff is that attributes of encoding classes can no longer be accessed directly via getattr, but this was not a very common pattern anyway.
Still TODO:
- [ ] add useful documentation and keyword argument completion for these functions.
- [ ] tests ensuring equivalence of old and new syntax.
If you'd like to try this in its current state, you can run
pip install git+https://github.com/jakevdp/altair.git@attr-access
I think this is awesome! I wanted to raise a few questions/concerns to make sure I fully understand the proposal.
Do we support the method-style API for nested properties or just encodings?
One thing I am worried about is how people access nested properties. For example, if I want to customize the scheme.
This is how I set the Scale's scheme property.
alt.Chart(data).mark_point().encode(
alt.Color('color:Q').scale(scheme='viridis')
)
If I now want to customize the scheme, I need to switch from the method style to the argument style. I hope it's not going to be too surprising for people.
alt.Chart(data).mark_point().encode(
alt.Color('color:Q').scale(scheme=alt.Scheme(name='viridis', extent=[0, 0.9]))
)
I assume that Altair supports this method syntax for everything, not just encodings, right?
alt.Chart(data).mark_point().encode(
alt.Color('color:Q').scale(alt.Scheme().name('viridis').extent([0, 0.9]))
)
Otherwise, I worry about making a special syntax for encodings.
Formatting long specs
The other thing I think about is formatting. Is this valid Python?
alt.Chart(data).mark_point().encode(
alt.X('column:Q')
.axis(None)
.scale(zero=False)
.title('my title'),
alt.Color('color:N')
.legend(orient='bottom')
.scale(scheme='viridis')
)
In the current syntax, one can write this.
alt.Chart(data).mark_point().encode(
alt.X(
'column:Q',
axis=alt.Axis(None),
scale=alt.Scale(zero=False),
title='my title'),
alt.Color(
'color:N',
legend=alt.Legend(orient='bottom'),
scale=alt.Scale(scheme='viridis')
)
)
Help people use the new way
Another concern is that Altair now has two ways to express the same thing. I definitely think that we want to be backward compatible but I wonder how we can help people transition to the new API without too much confusion. How difficult would it be to write a converter? Kind of like 2to3 for Python 3?
Thanks @domoritz – a few thoughts:
Do we support the method-style API for nested properties or just encodings?
Up until now we've supported limited method-style APIs for top-level objects (e.g. alt.Chart().mark_point().encode().configure_scale()...etc.) The proposal here is to extend that kind of syntax only to encoding channel objects. It's not clear to me that extending that to further nested objects would be useful. In your example of the scheme, how would you propose a syntax for nesting method-based setting? scale() returns the parent encoding object to allow chaining. So maybe you could do
alt.Color('column',
scale=alt.Scale().scheme('viridis', extent=[0, 0.9])
)
but it's not clear to me that's better than
alt.Color('column').scale(
scheme=alt.Scheme('viridis', extent=[0, 0.9])
)
particularly since nested settings like this are so much less common than their unnested counterparts. A question: how does the JS API deal with nested settings like this?
Formatting long specs
Roughly, in Python whitespace doesn't matter if you're inside a parentheses block. So for example this is not valid Python
alt.Chart()
.mark_point()
.encode(x='foo')
but it's really nice, which is why sometimes you come across code written like this, which is valid Python:
(alt.Chart()
.mark_point()
.encode(x='foo'))
Because these encoding properties generally appear within parentheses, they can take advantage of this, and this is indeed valid Python:
alt.Chart(data).mark_point().encode(
alt.X('column:Q')
.axis(None)
.scale(zero=False)
.title('my title'),
alt.Color('color:N')
.legend(orient='bottom')
.scale(scheme='viridis')
)
Helping people use the new way
The old way would of course still be valid, so most code will not break (the exception is, e.g., if someone tries to access chart.encoding.x.scale and expects it to be an alt.Scale object). But I'd change all examples and demos to use the new syntax, in order to encourage its adoption.
Would it make sense to consider using “magic underscore” style arguments for nested properties beyond the method’s depth?
The Plotly.py library recently implemented that syntax based on successful usage in the Julia version of Plotly. Read more about the Plotly.py implementation in this issue.
That would turn:
alt.Chart(data).mark_point().encode(
alt.Color('color:Q').scale(scheme=alt.Scheme(name='viridis', extent=[0, 0.9]))
)
into:
alt.Chart(data).mark_point().encode(
alt.Color('color:Q').scale(scheme_name='viridis', scheme_extent=[0, 0.9])
)
Maybe:
alt.Chart(data).mark_point().encode(
alt.Color('color:Q', alt.Scale(alt.Scheme(name='viridis', extent=[0, 0.9]))
)
@iliatimofeev that sort of specification was available in Altair v1, but we abandoned it in later versions.
@jakevdp for encoding channels it still works in your prototype :).
As I understand the goal is to avoid doubling semantics like scale = alt.Scale()
Chaining methods look very cool on the second layer but breaks on the third. But scale = alt.Scale( scheme=alt.Scheme()) is quite regular case.
The idea of “magic underscore” is cool too but it will break compatibility with vega-lite documentation.
I guess type based notation is more universal
alt.Color(alt.Scale( alt.Scheme('blues'),alt.Domain(-1,1), clip = False)
Do you remember the reasons why it is not supported in v2?
It's always worked for arguments of chart.encode(), but only worked for encoding channel arguments in Altair v1.
The main reason we dropped it is because it became difficult to implement in a consistent way when we moved from a traitlet-based prescriptive type system to a jsonschema-based proscriptive type system.
I know two cases where "types" style wouldn't work: tooltips channel and conditions.
For tooltips, we can add an extra virtual channel: atl.Tooltips("data1:N","data2:T", alt.Tooltip("data_t",timeUnit="day"))
And for conditions add extra parameter else to channels, like: alt.Color("data1:N", alt.Else(selection,"red"))
tl;dr - I like method chaining, but let's not break the attribute API in the process.
The original idea was for the chained method API to provide a thin optional layer on top of the attribute based approach, in particular for "single expression" interactive usage. However, for deeper API usage where you are closing, reusing, mutating, etc. Chart objects across a library, the attribute based approach remains powerful and expressive.
Because of this, the main concern I have here is breaking the attribute based API. I think that will cause a very high level of pain for users and major API breakage. The reason the existing encode method doesn't interfere with the attribute based API is that the noun (encoding) is the attribute and the very (encode) is the method. While libraries regularly break this separate of noun-based attributes and verb-based attributes, it was one that we tried to follow closely in the Altair APIs. The reason this proposed API breaks the attribute based access is precisely because it overrides the noun-based attributes to be methods. Because of that I am not in favor of this API change as is, but would welcome more method chaining that preserves the attribute based access.
Thanks for the input, @ellisonbg. Philosophically, I like the "verb/noun" construct we started with, but note that it breaks down almost immediately when applied to the Vega-Lite schema (transform/transform and bin/bin come to mind). And what is the verb form of a setting like scale? I suppose set_scale() makes sense, but I find explicit getters & setters like this to be a bit cluttery.
What this PR comes down to is this: I think I'm willing to forego philosophical purity of a rarely-used API (nested attribute access in charts) in favor of simplicity for an often-used API (chart construction via uncluttered method chaining).
I guess my concrete question for you would be: given the constraints you're concerned about, what API would you propose for this chart? Or are you OK keeping the status quo?
alt.Chart(data).mark_point().encode(
alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'),
alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis'))
)
Will think on this a bit - it is a good question as it is definitely verbose. A few other questions to explore as well:
- If we go with the method chaining, what is the API for mutating an existing spec. Let's say I want to take the spec above, and change the x title.
- If we go with the method chaining, how do I ask a spec the value of an attribute?
I don't view my objection as one of philosophical purity - only that the purity of encode/encoding let us avoid the conflict between the attribute and method APIs. I too am fine with pushing the boundary, bluring the edges, etc. for practicality's sake (I am aligned with you and the zen of python on this front).
On Mon, Aug 26, 2019 at 7:57 AM Jake Vanderplas [email protected] wrote:
Thanks for the input, @ellisonbg https://github.com/ellisonbg. Philosophically, I like the "verb/noun" construct we started with, but note that it breaks down almost immediately when applied to the Vega-Lite schema (transform/transform and bin/bin come to mind). And what is the verb form of a setting like scale? I suppose set_scale() makes sense, but I find explicit getters & setters like this to be a bit cluttery.
What this PR comes down to is this: I think I'm willing to forego philosophical purity of a rarely-used API (nested attribute access in charts) in favor of simplicity for an often-used API (chart construction via uncluttered method chaining).
I guess my concrete question for you would be: given the constraints you're concerned about, what API would you propose for this chart? Or are you OK keeping the status quo?
alt.Chart(data).mark_point().encode( alt.X('column:Q', axis=alt.Axis(None), scale=alt.Scale(zero=False), title='my title'), alt.Color('color:N', legend=alt.Legend(orient='bottom'), scale=alt.Scale(scheme='viridis')) )
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/pull/1629?email_source=notifications&email_token=AAAGXUEHO3WNFH7AA4NLPWDQGPVNRA5CNFSM4IHAKFG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5EUBSI#issuecomment-524894409, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGXUDY5CZXORQNBCUKTTTQGPVNRANCNFSM4IHAKFGQ .
-- Brian E. Granger
Principal Technical Program Manager, AWS AI Platform ([email protected]) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub
If we go with the method chaining, what is the API for mutating an existing spec. Let's say I want to take the spec above, and change the x title.
Because getattr and setattr are different code paths, chart.encoding.x.title = new_title will do what you expect.
If we go with the method chaining, how do I ask a spec the value of an attribute?
That's what changes here... title = chart.encoding.x.title is now a function rather than the title value. You'd have to do title = chart.encoding.x['title'] to get the value out. This is the core of the change here and the core of the tradeoff – I don't think I've ever seen anyone use code that depends on accessing deeply-nested attributes like this, but I very often see code constructing charts with large lists of prop = alt.Prop(...) arguments.
One option, which is admittedly a bit weird, would be to initialize each value with a setter function, and have that function replace itself with the value once it's called.
Then this would work:
x = alt.X('foo').title('foo')
x.title == 'foo' # True
But this wouldn't, right?
x = alt.X('foo').title('foo')
xx = x.title('bar')
Right, that wouldn't work.
I think that broadly, the problem is that Altair was originally built on top of traitlets, and we've been slowly moving away from a traitlets-like model because it's fundamentally incompatible with a proscriptive schema-based object validation.
Now we're trying to evolve the chart construction API, and we're running into conflicts with vestiges of the original traitlets-based approach that still exist, but for no good reason other than a modicum of backward compatibility between Altair v1 and v2. If it comes down to a choice between the two, I'd personally lean toward jettisoning the traitlets-inspired model altogether in favor of a better plotting API.
Our choice of traitlets and the attribute based model are correlated, but the direction of causation was the opposite. Namely, we wanted the API to be pythonic (don't hide things behind getters/setters, and dict based access is overly verbose) so we choose Traitlets as a way to provide a validation layer for an attribute based API. Even though we are not using traitlets, I think the underlying logic still holds.
It is a good point that we do have dict ([key]) based access as a separate API available so we can cleanly separate out methods from values, but I am still not convinced it is worth breaking the entire API for this. To be clear though, my argument is not a backwards compatibility one (I am generally in favor of moving APIs forward to serve users). I am not convinced this change will be a net improvement. I would propose we get creative to figure out a way to improve the method based API, while preserving the attribute based one. What about having a special attribute on the objects that serves as a namespace for the method based API?
On Mon, Aug 26, 2019 at 10:47 AM Jake Vanderplas [email protected] wrote:
I think that broadly, the problem is that Altair was originally built on top of traitlets, and we've been slowly moving away from a traitlets-like model because it's fundamentally incompatible with a proscriptive schema-based object validation.
Now we're trying to evolve the chart construction API, and we're running into conflicts with vestiges of the original traitlets-based approach that still exist, but for no good reason other than a modicum of backward compatibility between Altair v1 and v2. If it comes down to a choice between the two, I'd personally lean toward jettisoning the traitlets-inspired model altogether in favor of a better plotting API.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/pull/1629?email_source=notifications&email_token=AAAGXUDMJMSO47UQBP6DOETQGQJJ7A5CNFSM4IHAKFG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FDQHQ#issuecomment-524957726, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAGXUHFUEDN2UPW6LUADEDQGQJJ7ANCNFSM4IHAKFGQ .
-- Brian E. Granger
Principal Technical Program Manager, AWS AI Platform ([email protected]) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub
Thanks Brian - one possiblity I thought of: we could make any undefined attributes evaluate as a callable setter function. But then I could imagine users getting pretty confused by that if they try to override an existing value...
Another possibility is to have method-based setters only for channel classes, and to replace these setters with the attributes once they're used in the chart. So this would work correctly:
chart = alt.Chart(data).encode(
x=alt.X('x').bin(maxbins=20)
)
assert chart.x.bin.maxbins == 20
In the last example, would this work?
x = alt.X('x').bin(maxbins=20)
chart1 = alt.Chart(data).encode(
x=x
)
assert chart1.x.bin.maxbins == 20
xx = x.bin(maxbins=30) # does this work???
chart2 = alt.Chart(data).encode(
x=xx
)
assert chart2.x.bin.maxbins == 30
@domoritz - yes, it could be made so that would work.
It seems like we're getting close to a new major vega-lite release again. I'm wondering if it is worth revisiting this change to the API. I for one, would love to see this change. I think that it would be a usability improvement and make it easier to move between Altair and vega-lite-api. I'd be happy to help translating docs, if you decide to go forward with it.
This would be a super cool enhancement that I would love to see as well! (but I understand probably quite time consuming to implement) I would be happy to help out with testing, writing docs, or something of similar difficulty/skill level if needed.
We're still stalled on wrapping Vega-Lite 4.9... I think it's going to be a while before we can get to the newer Vega-Lite features, and a change like this is even more major.
Would the next release of Altair be a natural time to add this? I would be very happy to experiment with this PR if there is interest!
I would love to see this too and have been thinking that a 5.0 release would be an opportune moment to introduce a big addition like this! I started working on this a little bit myself a few weeks ago and I just uploaded my branch to https://github.com/altair-viz/altair/pull/2592, feel free to build on it or just copy the files and start over, whatever is easier.
As far as I understand, the biggest outstanding item from the discussion above is whether the new method-based syntax warrants losing the ability of getting and setting values via the syntax chart.encoding.x.title and instead having to do chart.encoding.x['title']. If not, we would need a workaround, maybe one of the one's Jake suggested above? I also asked about this on SO recently and it is actually possible to have chart.encoding.x.title work for setting/getting and have chart.encoding.x.title() update the value, but that could be confusing so it is probably best to avoid it.
For me personally, the added value of using the syntax proposed here outweighs the inconvenience of having to use the dictionary-like syntax to get/set values, but I think it is up to @jakevdp and @ellisonbg to decide on which direction they prefer for Altair and what would be the most helpful way to contribute here.
I agree this api would be super nice and that breaking the access as you suggested makes sense. I guess there may be some benefit to maintaining backwards compatibility and we could do so with a warning.
Implemented in #2795