altair
altair copied to clipboard
Add method based attribute setting to encoding classes for Altair 5
I have been busy lately and not made progress beyond moving Jake's changes from #1629 into the right places of the current development branch. They seem to still be working fine from my local testing, but I will not have much time to work on this until June/July so anyone who wants to take over before then please do! I would love to see this syntax in Altair!
I think the biggest outstanding item is the one mention in https://github.com/altair-viz/altair/pull/1629#issuecomment-1100770312. So either figuring out a workaround for that or making progress on these smaller outstanding changes would be helpful?
- [ ] Add keyword argument completion for these methods.
- [ ] Add tests ensuring equivalence of old and new syntax.
- Maybe we can can create all relevant the gallery plots with both the old and new syntax?
- [ ] Add useful documentation
- For this I think it would be great with a dedicated section mentioning the method syntax explicitly and then use Spinx Tabs to display both the old and new syntax throughout the docs and for the example gallery.
Working example:
import altair as alt
from vega_datasets import data
source = data.cars.url
alt.Chart(source).mark_point().encode(
alt.X('Horsepower:Q').scale(zero=False),
alt.Y('Miles_per_Gallon:Q').scale(zero=False).title('Miles / Gallon'),
alt.Color('Origin:N').title('').legend(orient='bottom').scale(scheme='Set2')
)
Maybe it's nicer to format it like this:
alt.Chart(source).mark_point().encode(
alt.X('Horsepower:Q')
.scale(zero=False),
alt.Y('Miles_per_Gallon:Q')
.scale(zero=False)
.title('Miles / Gallon'),
alt.Color('Origin:N')
.title('')
.legend(orient='bottom')
.scale(scheme='Set2')
)
I merely think people will combine the methods as such
!python -m pip install git+https://github.com/joelostblom/altair.git@method-based-attr-setting
import altair as alt
from vega_datasets import data
source = data.cars.url
s1 = alt.Scale(zero=False)
s2 = alt.Scale(scheme='viridis')
chart = alt.Chart(source).mark_point().encode(
x=alt.X('Horsepower:Q').axis(labels=False).scale(s1).title('foo'),
color=alt.Color('Origin:N', scale=s2).legend(orient='bottom')
)
chart
Observe on the x
encoding, I use the .scale(s1)
and on the color
encoding I use the scale=s2
. It still works though.
Personally I think this deep nested setting and getting of attributes is neither a show-stopper nor a big issue. I do not think it is common usage by normal users. What I did do in the past is when a newer version of VL can render something which cannot yet pass validation in altair that I compile the chart into a VL-dict and then change the value of the corresponding key. Something as such:
ctd = chart.to_dict()
ctd['encoding']['x']['title'] = {'signal': 'bah'}
But it felt like a hack and not following the zen of altair, but again, this will still works.
I think the interesting outstanding issue is the part how we can solve this keyword argument completion for these methods.
I made some progress on the completions in my latest commit! The help now shows the correct signature and docstring:
(there are some unnecessary fields here that should be removed somehow,
type
, file
, etc)
The tab completion works for the class (the suboptimal ordering is an issue with jlab/ipython, but the maxbins
param shows up):
but not for instances of the class:
I asked about this on SO, but if anyone here has ideas, please chime in. Update: The SO question is solved but I think there is a separate issue in Altair that still prevents params from showing.
One last issue I am running into is that completion for the method name also only works for the class, but not for instances of the class. So typing alt.X.b
and then pressing Tab completes to bin
but there is no completion on alt.X().b
(this works in a toy example so I think it is an altair specific issue)
Hi @joelostblom, is there anything I can do to help with this PR? I think it would be great if this syntax can be used after the next major release.
Thank you! It would be awesome if you want to look into getting the tab completion to work properly! Both the method name and the parameters inside the method are currently not showing up in the tab completion. I gave a detailed example on exactly what works and what doesn't in my comment just above and I have not worked on it more since then. I think that is the only technical problem left to fix (then there is this potential conceptual challenge https://github.com/altair-viz/altair/pull/1629#issuecomment-1100770312 and we need to add some tests and docs, but that's later). Let me know if it is unclear and I will try to clarify.
Sorry for my git ignorance @joelostblom... should I be able to switch between my own branches of Altair and yours here: https://github.com/joelostblom/altair/tree/method-based-attr-setting
I haven't been able to figure out how to reach your method-based-attr-setting
on my own computer. Should I just create a whole new folder on my computer for your branch?
All but the basics of git tend to leave me confused...
No worries Chris, you should be able to add to my branch if you follow the steps in this article https://tighten.com/blog/adding-commits-to-a-pull-request/. You could also do what you suggested and create a new folder and clone my fork of the repo there. Since you don't have the "maintainer" permissions on Altair, I went ahead and invited you to my fork separately, so you should have no issues pushing directly to this branch.
If you run into issues with this, I am also more than happy for you to copy over these changes manually and open a new PR, I don't want you to be held up by git issues
Thanks a lot Joel, that worked! Essentially everything related to git takes two more steps than I expect. In this case, I think I was trying to go directly to the git checkout
step, and was missing the git remote add
step and was also missing the git fetch
step.
Hi @joelostblom, I'm trying to get a sense for the issues you mentioned https://github.com/altair-viz/altair/pull/2592#issuecomment-1125544599
Does the following count as tab-completion working for an instance? (I didn't make any changes to the code.) I don't really see how this is substantively different from the non-working example alt.X().bin(ma...)
.

Interesting example, I didn't know that would work. Maybe it has something to do with the function only being called when the cell is executed and before that tab completion can't inspect further? But it must be be possible to bypass that because tab completion is working for eg alt.Chart().<tab>
and I think also for alt.Chart().mark_point(col<tab>
(on my phone right now and can't double check).
We would ideally want it working for alt.X(). bin(ma<tab>
and alt.X().b<tab>
.
Hi @joelostblom, I've browsed around a little but haven't made any real progress. I tried using the Python console, and also there alt.X.bi
will tab-complete to alt.X.bin
but alt.X().bi
will not tab-complete. So it's at least not some special functionality related to Jupyter Lab or VS Code.
On the other hand, in Jupyter Lab, alt.Chart().mark_point(si
will tab-complete to alt.Chart().mark_point(size=
, but alt.Chart().mark_point(si
won't tab-complete in the Python console.
One non-Altair example, in Jupyter Lab, pd.DataFrame().max(ax
will tab-complete to pd.DataFrame().max(axis=
, but pd.DataFrame().max(ax
won't tab-complete in the Python console.
I'm not sure what the upshot is to that, just thought I would write it down!
Thanks again for working on this Chris! I believe that the special tab completion functionality you are referring to is inherent to IPython, so if you try in the IPython console rather than the regular one you should see similar completion to that of vs code and juputerlab (although the order of the suggestions might differ).
Hi @joelostblom, thanks for the reply! Just to clarify, the tab-completions in the console I was talking about above are just in the regular Python console (like what I get by typing "Python" into terminal). In that Python console, alt.X.bi
will tab-complete, but alt.X().bi
won't tab-complete.
I will keep looking around, but please keep me updated on any ideas! I haven't thought about this kind of issue before.
Hi @joelostblom, do you think the tab-completion issues are related to __getattr__
and __dir__
being defined explicitly in https://github.com/altair-viz/altair/blob/bff48d5a235363a74df881c0d167206a9bbb45e1/altair/utils/schemapi.py#L255-L260 and in https://github.com/altair-viz/altair/blob/bff48d5a235363a74df881c0d167206a9bbb45e1/altair/utils/schemapi.py#L474-L475
I tried making a toy example with (as far as I can tell) similar logic and the behavior seemed similar. I put the following in a separate file named sample.py
:
class A:
def __init__(self):
self.x = 2
def __getattr__(self, attr):
if attr in ["y", "z"]:
return 5
def __dir__(self):
return ["y", "z"]
and then tried tab-completion in Jupyter Lab:
and
To me this feels similar to the behavior we saw above.
In this toy example, if we wanted A().
to have y
and z
offered as completions, is it clear how we should proceed?
Hmm, one thing that seems different is that in your example A.<tab>
does not complete whereas in our original example, we can see that alt.X.bi<tab>
does complete. So I am not sure if it is exactly the same, and I am not sure what would be the best way to proceed on the top of my head unfortunately.
Hi @joelostblom,
I haven't had any success getting the auto-completion to work so far.
Do you see any potential in taking a slightly different approach like the following?
I tried adding a few sample methods to the FieldChannelMixin
class:
def set_bin(self, *args, **kwargs):
copy = self.copy()
copy.bin = core.BinParams(*args, **kwargs)
return copy
def set_scale(self, *args, **kwargs):
copy = self.copy()
copy.scale = core.Scale(*args, **kwargs)
return copy
def set_title(self, *args, **kwargs):
copy = self.copy()
copy.title = core.Text(*args, **kwargs)
return copy
def set_legend(self, *args, **kwargs):
copy = self.copy()
copy.legend = core.Legend(*args, **kwargs)
return copy
In my simple tests that seemed to work well. For example, the following code works (based on your code above):
And auto-completion seems to behave better:
and
It seems to me like our lives would be a lot easier if we used a syntax like bin
for the attribute and set_bin
for the attribute setter method.
If it doesn't make sense to use the same definitions for everything that is a FieldChannelMixin
, the definitions could be made more locally.
I also wasn't sure if specifying something like title = core.Text
was too restrictive for this use.
Do you have any immediate comments on that type of approach? I wanted to check in before trying to work on it. @jakevdp and @mattijn I'd also be very happy to hear your thoughts.
Hmmm I don't have a good sense of what a solution here might be, but in general I would prefer working with a .bin
method rather than a .set_bin
method. Jake's older comment here seems to suggest that he is/was of similar opinion:
I suppose
set_scale()
makes sense, but I find explicit getters & setters like this to be a bit cluttery.
Could a way forward be to break this down to a minimal reproducible example outside of altair and try to troubleshoot it that way? We could even post it to stackoverflow and get some input. I tried this earlier but ran into a separate issue that was specific to JupyterLab and didn't have time to rework the example. I don't know if we can include the specifics of this problem while still making it small enough for SO, but it might still be a useful way forward for us to troubleshoot?
I will also say that personally I believe that even if we don't have tab completion, the fact that the docstring help popup is now working properly makes this PR worth merging, but it would still be very nice to get the tab completion working as well.
@joelostblom can you synchronise this PR with the main repo?
I feel the same, the current setter methods using decorators make more sense over explicit set_
functions. Also agreed that we can work on tab completion in a follow-up PR if we feel this PR is worth merging already. Tests and documentation are still missing I think?
Thanks for the comments! I am happy with whatever approach you think is best. I did come across this comment by Jake from almost exactly one year ago that made me wonder about alternatives to redefining bin
, scale
, etc as functions.
I'm happy to work on producing a minimal example of the failure of tab-completion, although my initial attempts make me think that whatever we produce will be too complicated and will get downvoted on Stack Overflow ;)
By the way, I don't think the title
portion of the very top example above is working, or at least it didn't work on my end when I tried:
import altair as alt
from vega_datasets import data
source = data.cars.url
alt.Chart(source).mark_point().encode(
alt.X('Horsepower:Q').scale(zero=False),
alt.Y('Miles_per_Gallon:Q').scale(zero=False).title('Miles / Gallon'),
alt.Color('Origin:N').title('').legend(orient='bottom').scale(scheme='Set2')
)
can you synchronise this PR with the main repo?
Done! Tests seem to be failing because of an issue installing chromedriver on the test image?
Tests and documentation are still missing I think?
Yup, do you think it would be enough to create a version of each of the gallery examples with this new syntax and use those as tests? Or should we create some new types of tests?
I did come across this https://github.com/altair-viz/altair/pull/2517#issuecomment-978455614 from almost exactly one year ago that made me wonder about alternatives to redefining bin, scale, etc as functions.
I was thinking about that too, but it does indeed seem like a bigger change. Maybe it is better though because it will make it really clear when we are trying to access and attributes value versus trying to call a function?
By the way, I don't think the title portion of the very top example above is working, or at least it didn't work on my end when I tried
I can confirm that this didn't work for me either. There are other parameters such as stack
that don't work either so maybe this is expected? But if I recall correctly it was working before to do .title()
, maybe I'm misremembering.
I re-run the tests and these are fine now based on current defined tests.
Basically currently only variables that can be declared as ={}
in the encoding channel will function:
foo = alt.Chart().mark_point().encode(
alt.X(scale={})
).to_dict()
bah = alt.Chart().mark_point().encode(
alt.X().scale()
).to_dict()
foo == bah # True
But variables that only can be declared as str
will not work:
alt.Chart().mark_point().encode(
alt.X().shorthand('field:Q')
).to_dict()
Gives:
TypeError: 'UndefinedType' object is not callable
I would suggest to have a test suite that shows which variables work, but also which variables are (as expected) not working.
with pytest.raises(TypeError):
alt.Chart().mark_point().encode(
alt.X().shorthand('field:Q')
).to_dict()
Maybe it can be done in combination with @pytest.mark.parametrize()
I can understand why this results in a TypeError
(not clear error though) but it can be confusing too. Maybe its sensible for shorthand
, but for variables such as title
and stacked
its not strange thinking to define it as the examples that in earlier stages of this PR did seem to work (wrongly?).
Hi @mattijn, I didn't totally understand the str
vs ={}
distinction you mentioned. Does that explain which of these functions are working and which don't? Do you see what the issue is? Here are the ones that seem to be working/not working for alt.X()
:
import altair as alt
schema = alt.X().resolve_references()
for k in schema['properties'].keys():
print(k)
print(getattr(alt.X(),k))
print()
produces the following, so I guess bandPosition
, impute
, etc are not working at the moment.
aggregate
<altair.utils.schemapi._PropertySetter object at 0x163541000>
axis
<altair.utils.schemapi._PropertySetter object at 0x16358cf70>
bandPosition
Undefined
bin
<altair.utils.schemapi._PropertySetter object at 0x16358cc40>
field
<altair.utils.schemapi._PropertySetter object at 0x16358ee90>
impute
Undefined
scale
<altair.utils.schemapi._PropertySetter object at 0x16358ea10>
sort
<altair.utils.schemapi._PropertySetter object at 0x16358f2b0>
stack
Undefined
timeUnit
Undefined
title
Undefined
type
<altair.utils.schemapi._PropertySetter object at 0x1635a87c0>
It seems that currently the attribute setter is enabled only for properties in the encoding channel that can contain an object. Strange though, that impute
is Undefined
since it clearly is possible to define an object as well for impute in VL: https://vega.github.io/vega-lite/docs/impute.html#encoding.
I'm still a little apprehensive about having something like alt.X().axis
evaluate to something different from the method alt.X().axis()
. Overall that just seems kind of confusing to me. What would you think about eventually removing this second alt.X().axis
value, so that alt.X().axis
returns the function that's called when we use alt.X().axis()
?
But I'm not at all confident in my opinion and am happy to hear that having both is good and not unusual.
@mattijn Thanks for the information about which encoding channels seem to be using the property setters. I haven't wrapped my head around what's happening, but I should have some time soon to look at it.
Yeah, I agree that it can be confusing. I am not against having something like what is suggested in https://github.com/altair-viz/altair/pull/2517#issuecomment-978455614. Not sure how much breakage this would cause for people's workflows though. I mostly use chart.to_dict()
currently myself because of https://github.com/altair-viz/altair/issues/2497
Thanks @joelostblom! It's interesting reading https://github.com/altair-viz/altair/pull/2517#issuecomment-978455614 again, because that comment seems to almost be the opposite of what I thought was going on in this PR. I had thought that in the end, this PR retained the x.field
returns "x"
functionality. I guess I should keep staring at it.
I personally prefer that x.field
returns the method, because that seems to be least confusing to me. (Or the set_field
approach we discussed above.)
Hi @joelostblom, I think I tracked down what the issue is with impute
and some of the other methods we're trying to define. I think the problem is this line: https://github.com/altair-viz/altair/blob/9fa0c387fdcf31900015ac87d52ede68e30832b6/altair/utils/schemapi.py#L605
There's no class named exactly Impute
(rather there is ImputeParams
and a few others).
How about this for a temporary fix. We replace _PropertySetter.__get__
with the following:
def __get__(self, obj, cls):
self.obj = obj
self.cls = cls
try:
altair_prop = getattr(vegalite, f"{self.prop}".capitalize())
self.__doc__ = altair_prop.__doc__
self.__signature__ = inspect.signature(altair_prop)
self.__wrapped__ = inspect.getfullargspec(altair_prop)
self.__name__ = altair_prop.__name__
except AttributeError:
pass
return self
What do you think?
Ah ok, that makes sense, great find! What do you think about making it a conditional statement instead of try/except? The we can be more explicit about what we're checking for, and we could have a three scenarios:
- Check if the name exists then use as is.
- Check if the name exists with "params" appended, then use that.
- Pass through in case there are additional cases. Or maybe raise a warning for now so that we can detect them easier?
Sounds great to me! I feel like this code runs when the class is instantiated, not when the method is called, so raising a warning doesn't seem like the right thing to do to me. But maybe I'm misremembering the tests I did. I would just pass through in case 3.
There is this code which makes the bin
method already work:
https://github.com/joelostblom/altair/blob/9fa0c387fdcf31900015ac87d52ede68e30832b6/altair/vegalite/v5/api.py#L108-L109
I don't know if that makes you rethink the strategy at all, but to me your strategy sounds good.
Oh cool that alias is a nice find. I To be honest, it always bothers me that I have to write out TitleParams instead of just Title... So I would be in favor of adding aliases for all or at least the most common Params-classes, there instead of the fix I mentioned earlier.
Since this seems relatively simple, it makes me wonder if there is a reason this isn't already there though. Maybe it was just added later to VegaLite?
Maybe we could stick to your original if
-elif
for now since that seems like a change that's less likely to inadvertently break anything else, and make a separate PR with the extra aliases? Just an idea! If you think the alias idea is equally safe, I'm fine with that.
Just let me know if you want me to do anything.