altair
altair copied to clipboard
Altair produces syntactically incorrect JSON output that contains NaN token
See line #111 in this Gist, where Altair produced output with a NaN token that is not syntactically valid JSON. The library will happily produce a JSON file like that, that then is prone to causing really opaque errors, for example in this altair_saver issue.
The way I got into this mess: I set a domain on an axis using a value that I computed from the data it visualizes but it turns out there's an edge case where my calculation can produce NaN. Ooops.
It occurs to me that the library should have a validation step to catch this sooner and produce a more useful error message.
Can you provide a short reproducible example of code that produces invalid output?
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
bad = alt.Chart(source).mark_bar().encode(
x='a',
y=alt.Y('b', scale=alt.Scale(domain=[0, float('Nan')]))
)
bad.save('bad.json')
When I look at bad.json (pasted below) I see JSON output with "scale": {"domain": [0, NaN]}. I pasted it into the online Vega Editor and it gives [Error] Unexpected token N in JSON at position 267 and line 1, and when I click on the "Share" option (to get a friendly link to the example inside the editor), the editor crashes.
I tried to pretty-print this JSON output to make it friendlier, but jq actually parses the NaN and produces null in the output:
{"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, "data": {"name": "data-c2a3e89ba9d5d1687d5e8c28d630a033"}, "mark": "bar", "encoding": {"x": {"type": "nominal", "field": "a"}, "y": {"type": "quantitative", "field": "b", "scale": {"domain": [0, NaN]}}}, "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json", "datasets": {"data-c2a3e89ba9d5d1687d5e8c28d630a033": [{"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43}, {"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53}, {"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}]}}
Altair relies on Python's built-in json module to serialize user input to JSON. It seems the bug is in the Python standard library:
>>> import json
>>>json.dumps({'x': float('nan')})
{"x": NaN}
Ah, maybe we should be using something like allow_nan=False:
>>> json.dumps({'x': float('nan')}, allow_nan=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-62e31c5f8250> in <module>()
1 import json
----> 2 json.dumps({'x': float('nan')}, allow_nan=False)
2 frames
/usr/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
ValueError: Out of range float values are not JSON compliant
Using an allow_nan-False option sounds liken essential failsafe. The only (generic) question I'd pose myself is if higher-level validation that gives friendlier error messages for the library user (e.g., indicate where the NaN was found) is worth the effort for what's likely an infrequent failure mode.
FYI this is still an issue, I just got bitten by this today. One workaround is to preprocess the input DataFrame with .replace({np.nan: None}) to turn NaNs into None which then get translated into valid JSON null values.
https://docs.python.org/3.13/library/json.html#standard-compliance-and-interoperability
This module does not comply with the RFC in a strict fashion, implementing some extensions that are valid JavaScript but not valid JSON. In particular:
Infinite and NaN number values are accepted and output;
Repeated names within an object are accepted, and only the value of the last name-value pair is used.
Since the RFC permits RFC-compliant parsers to accept input texts that are not RFC-compliant, this module’s deserializer is technically RFC-compliant under default settings.
I'm closing this as not planned. Changing the default will inevitably break others' code and there are at least 3 ways to workaround this:
- Clean and validate your data before visualizing it
- Use something like
alt.FieldValidPredicateoralt.expr.isNanwhile vizualizing - Change the call
bad.save('bad.json')->bad.save("bad.json", json_kwds={"allow_nan": False})