altair
altair copied to clipboard
Support Vega-Lite's optional encoding types
Since 4.14, the encoding type is optional in Vega-Lite and inferred according to some simple heuristics if not given explicitly. Altair raises an error if there is no type provided, but maybe we can remove this check now and just let Vega-Lite handle missing types? This could also make error such as misspelling a data frame column name more clear in Altair (which now raises the "field specified without type" error).
Example:
import altair as alt
data = alt.Data(values=[{'x': 'A', 'y': 5},
{'x': 'B', 'y': 3},
{'x': 'C', 'y': 6},
{'x': 'D', 'y': 7},
{'x': 'E', 'y': 2}])
alt.Chart(data).mark_bar().encode(
x='x',
y='y:Q',
)
ValueError: x encoding field is specified without a type; the type cannot be automatically inferred because the data is not specified as a pandas.DataFrame.
Although the VegaLite spec is valid and produces a sensible figure in this case:
{
"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
"data": {
"values": [
{"x": "A", "y": 5},
{"x": "B", "y": 3},
{"x": "C", "y": 6},
{"x": "D", "y": 7},
{"x": "E", "y": 2}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "x"},
"y": {"field": "y", "type": "quantitative"}
},
"$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}
Thanks for raising this! That would be great.
One choice we have to make is whether to continue inferring the dtype from pandas dataframes, or just leave all type inference to Vega-Lite. I lean toward the latter, so that the behavior will be the same regardless of how the data is specified. What do you think?
I can see benefits of both approaches, but overall I am leaning towards keeping (and extending) the support for pandas data types. If Vega-Lite would be able to infer quantitative and temporal data, then I would be more in favor of relying on its type inference (https://github.com/vega/vega-lite/issues/8081). Here are my thoughts in more detail:
-
As you said, it would be nice with a consistent syntax regardless of the data source. On the other hand, I think the Vega-Lite type inference is still not on par with what Altair does via pandas, particularly since it is using
nominalas the default for all non-aggregated fields, which means that there would be a lot of:Qtyping. -
I think it is easier to explain that Altair "understands the data type used in pandas" instead of explaining the default rules in Vega-Lite; especially novices might be somewhat intimidated by this:

-
With the Vega-Lite type inference, it might be confusing when one needs to be explicit about the data type. Now it is easy: "never" if using pandas. Here I could see an argument for requiring "always regardless of data source" since being explicit about the data types might cause people to think more about what they are trying to visualize, but that would also be slightly less convenient to type out.
-
I think it would be nice to extend support for Altair data types to also include categorical ordering (my attempt in https://github.com/altair-viz/altair/pull/2522), since this would make it even more seamless to use pandas with Altair.
To be clear, I still think it would be a big benefit to support the default Vega-Lite typing inference outside of pandas and I think it would enable us to have a clearer error message for typos in column names when using Altair.
I was thinking about this a little and unfortunately I don't see a great option. I tried deleting the part of the Altair code that raises an error if there's no type, and for example using data.cars.url vs data.cars() drastically changes the chart.
That's a good point, in your example it would be difficult to tell what went wrong in the first chart and it would not be intuitive that a change of the type is needed when using the URL since it is not when using the dataframe. If we go ahead with making a change here, we might need to handle URLs and dataframes differently and always require types for URLs still. That could still be worthwhile if it would clear up the error messages.
I just want to make note of two comments by @mattijn that are possibly relevant to this discussion: https://github.com/altair-viz/altair/issues/2868#issuecomment-1418209553 and https://github.com/altair-viz/altair/issues/2868#issuecomment-1418243196