plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

Inconsistent error message and documentation on px.histogram parameter nbins

Open gunnhildsp opened this issue 2 years ago • 4 comments

The plotly express documentation states that the parameter nbins of px.histogram should be an int, however, the error message when inputting something else states that the nbinsx property of histograms should be an int or a float that will be cast to an int. The validation seems to check if the parameter is an instance of an int. The documentation of go.Histogram does not specify types for nbinsx.

It seems to me the documentation of go.Histogram could also include that nbinsx (and nbinsy?) should be int and that the type cannot be a float that will be cast to an int. Would be happy to update documentation and code to reflect this.

Reproducible example with float input:

import plotly.express as px

df = px.data.iris()
px.histogram(
    df,
    x="sepal_length",
    nbins=47.0,
)

This produces a ValueError:

ValueError                                Traceback (most recent call last)
Cell In[11], line 1
----> 1 px.histogram(
      2     df,
      3     x="sepal_length",
      4     nbins=47.0,
      5 )

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/express/_chart_types.py:480, in histogram(data_frame, x, y, color, pattern_shape, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, hover_name, hover_data, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, pattern_shape_sequence, pattern_shape_map, marginal, opacity, orientation, barmode, barnorm, histnorm, log_x, log_y, range_x, range_y, histfunc, cumulative, nbins, text_auto, title, template, width, height)
    434 def histogram(
    435     data_frame=None,
    436     x=None,
   (...)
    472     height=None,
    473 ) -> go.Figure:
    474     """
    475     In a histogram, rows of `data_frame` are grouped together into a
    476     rectangular mark to visualize the 1D distribution of an aggregate
    477     function `histfunc` (e.g. the count or sum) of the value `y` (or `x` if
    478     `orientation` is `'h'`).
    479     """
--> 480     return make_figure(
    481         args=locals(),
    482         constructor=go.Histogram,
    483         trace_patch=dict(
    484             histnorm=histnorm,
    485             histfunc=histfunc,
    486             cumulative=dict(enabled=cumulative),
    487         ),
    488         layout_patch=dict(barmode=barmode, barnorm=barnorm),
    489     )

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/express/_core.py:2256, in make_figure(args, constructor, trace_patch, layout_patch)
   2251         group[var] = 100.0 * group[var] / group_sum
   2253 patch, fit_results = make_trace_kwargs(
   2254     args, trace_spec, group, mapping_labels.copy(), sizeref
   2255 )
-> 2256 trace.update(patch)
   2257 if fit_results is not None:
   2258     trendline_rows.append(mapping_labels.copy())

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/basedatatypes.py:5139, in BasePlotlyType.update(self, dict1, overwrite, **kwargs)
   5137         BaseFigure._perform_update(self, kwargs, overwrite=overwrite)
   5138 else:
-> 5139     BaseFigure._perform_update(self, dict1, overwrite=overwrite)
   5140     BaseFigure._perform_update(self, kwargs, overwrite=overwrite)
   5142 return self

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/basedatatypes.py:3940, in BaseFigure._perform_update(plotly_obj, update_obj, overwrite)
   3937                 plotly_obj[key] = val
   3938         else:
   3939             # Assign non-compound value
-> 3940             plotly_obj[key] = val
   3942 elif isinstance(plotly_obj, tuple):
   3944     if len(update_obj) == 0:
   3945         # Nothing to do

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/basedatatypes.py:4874, in BasePlotlyType.__setitem__(self, prop, value)
   4870         self._set_array_prop(prop, value)
   4872     # ### Handle simple property ###
   4873     else:
-> 4874         self._set_prop(prop, value)
   4875 else:
   4876     # Make sure properties dict is initialized
   4877     self._init_props()

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/basedatatypes.py:5218, in BasePlotlyType._set_prop(self, prop, val)
   5216         return
   5217     else:
-> 5218         raise err
   5220 # val is None
   5221 # -----------
   5222 if val is None:
   5223     # Check if we should send null update

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/plotly/basedatatypes.py:5213, in BasePlotlyType._set_prop(self, prop, val)
   5210 validator = self._get_validator(prop)
   5212 try:
-> 5213     val = validator.validate_coerce(val)
   5214 except ValueError as err:
   5215     if self._skip_invalid:

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/_plotly_utils/basevalidators.py:937, in IntegerValidator.validate_coerce(self, v)
    933 else:
    934     # Check int
    935     if not isinstance(v, int):
    936         # don't let int() cast strings to ints
--> 937         self.raise_invalid_val(v)
    939     # Check min/max
    940     if self.has_min_max:

File ~/.cache/pypoetry/virtualenvs/dsp-rNgR53Wo-py3.10/lib/python3.10/site-packages/_plotly_utils/basevalidators.py:296, in BaseValidator.raise_invalid_val(self, v, inds)
    293             for i in inds:
    294                 name += "[" + str(i) + "]"
--> 296         raise ValueError(
    297             """
    298     Invalid value of type {typ} received for the '{name}' property of {pname}
    299         Received value: {v}
    300 
    301 {valid_clr_desc}""".format(
    302                 name=name,
    303                 pname=self.parent_name,
    304                 typ=type_str(v),
    305                 v=repr(v),
    306                 valid_clr_desc=self.description(),
    307             )
    308         )

ValueError: 
    Invalid value of type 'builtins.float' received for the 'nbinsx' property of histogram
        Received value: 47.0

    The 'nbinsx' property is a integer and may be specified as:
      - An int (or float that will be cast to an int)
        in the interval [0, 9223372036854775807]

The same error can also be produced for a go.Histogram using a float input to nbinsx:

go.Histogram(
    x=df["sepal_length"],
    nbinsx=47.0,
)

gunnhildsp avatar Nov 28 '23 13:11 gunnhildsp

Good catch @gunnhildsp! My guess is this is a holdover from the JavaScript behavior, since there’s only one number type in JS. That said once plotly.js gets a hold of this number, it does check if nbins % 1 == 0 and if not it will discard the value, so either we should keep the current only-accept-int and fix the docs, or we should also accept float but only if it matches an int as far as % is concerned (and assuming that behavior is the same between Python and JS). I guess only-accept-int seems cleaner, so let’s keep that, and we’d welcome a PR to fix the messages/docs.

alexcjohnson avatar Nov 29 '23 00:11 alexcjohnson

@alexcjohnson thank you for such a quick reply!

Should I create a separate issue in the docs repo for the doc change or just reference this issue?

I see that the ValueError message is made in the IntegerValidator. My proposal is to introduce a new parameter to the class, float_can_be_cast_to_int which defaults to True, and set it to False from the validators NbinsxValidator and NbinsyValidator. What do you think about that approach?

gunnhildsp avatar Nov 29 '23 13:11 gunnhildsp

Should I create a separate issue in the docs repo for the doc change or just reference this issue?

This issue is fine for both. But let's nail down the behavior here first before updating the docs.

I see that the ValueError message is made in the IntegerValidator. My proposal is to introduce a new parameter to the class, float_can_be_cast_to_int which defaults to True, and set it to False from the validators NbinsxValidator and NbinsyValidator. What do you think about that approach?

Are there any integer attributes that DON'T behave the way you're seeing with nbinsx/nbinsy? All the usage of these validators is autogenerated from the plotly.js schema, so if there is any distinction in how different cases of IntegerValidator behave it would need to be tied to something in the schema, which I'm pretty sure there isn't... so I think all integer attributes are handled the same way inside plotly.js and therefore they should all inherit this behavior.

alexcjohnson avatar Dec 01 '23 21:12 alexcjohnson

Thank you for the input @alexcjohnson!

gunnhildsp avatar Dec 12 '23 09:12 gunnhildsp