panel icon indicating copy to clipboard operation
panel copied to clipboard

Expand `pn.widget` to accept Pandas columns

Open jbednar opened this issue 1 year ago • 6 comments

I recently made a presentation trying to convey as simple a flow as possible for employing reactive expressions in a notebook. The pitch was meant to be "if you have a notebook that works, modify it only a tiny bit to get it to work with a widget controlling it". For instance, if you have df[df['mag'] > 5].head() now, how easily can we change "5" to a slider?

The notebook I have so far is at https://anaconda.cloud/share/notebooks/972482fb-5c0e-479e-b2f1-fe9a60a96d0d/preview , and it all works well, but I feel that the step of getting a widget instantiated is more complicated than it needs to be:

import panel as pn
pn.extension(throttled=False, template='material')

slider = pn.widgets.FloatSlider(name='Minimum Magnitude', start=0, end=9, value=6)

That's a lot of syntax to remember or guess, plus specific numeric values to decide on and/or look up. Instead, what if we extended the currently fairly hidden pn.widget function to accept a Pandas Series, from which it can infer the type and configuration of a widget? It seems fairly well defined to implement pn.widget(df.mag) to return the equivalent of:

slider = pn.widgets.FloatSlider(name='Minimum Magnitude', start=df.mag.min(), end=df.mag.max(), value=(df.mag.max()-df.mag.min())/2)

Similarly, pn.widget(df.index) would create the equivalent of:

date = pn.widgets.DatetimeRangeSlider(name='Date', start=df.index[0], end=df.index[-1])

while a column (pd.Series) with categorical data would create a Selector, and so on. I think this approach would make working with DataFrames (and presumably similarly for Xarray dimensions or coordinates) much cleaner, clearer, easier to learn, easier to use, and less error prone.

Which widget to pick isn't always clearly defined, of course. E.g. integers are sometimes full ordinal values for which an IntSlider is appropriate, but also sometimes they just represent categories that happen to be denoted with numeric values. Might need to have a rule of thumb for integer columns to use a Selector when the number of unique values is < 50, and otherwise a slider.

If we use this approach, the example I was working on would reduce to the following code:

import numpy as np, pandas as pd, panel as pn
pn.extension(throttled=False, template='material') #

df = pd.read_parquet('../data/earthquakes-projected.parq')
df.index = df.index.tz_localize(None)
df = df[['mag', 'depth', 'place', 'type']][df['northing'] < 20037508]
df = pn.rx(df) #

table = df[df['mag'] > pn.widget(df.mag)].head()

where literally the only thing we did besides importing Panel was to run its extension and replace 5 with pn.widget(df.mag), which seems very clean to me.

jbednar avatar Jul 19 '24 20:07 jbednar

For filtering, I guess a range slider (as in the datetime example above) is more generally useful than a bare slider (as in the float example above). So maybe pn.widget could have an argument range=True that determines whether the widget selects a range rather than a single value? And maybe range=True by default when given a Pandas column (Series)?

BTW, see https://github.com/holoviz/panel/issues/1972 for a table listing the widgets to return and how such a factory function could specify them all.

jbednar avatar Jul 23 '24 05:07 jbednar

Strongly related to https://github.com/holoviz/panel/issues/6071.

philippjfr avatar Jul 25 '24 15:07 philippjfr

My main comment around this idea is that I don't think this is specific to Pandas or Series...any container of values should be supported (requiring a name when a label isn't immediately accessible). Probably anything supporting __iter__ actually...

jlstevens avatar Jul 25 '24 15:07 jlstevens

+(many) on this, even if it is only for Pandas to start with! Really clean.

Should their be / is there a way to 'print' what the widget has chosen / inferred? So like a code_repr that explicitly shows this?

Are the values stated in pn.extension required, and if they are, could also be inferred, or have defaults instead?

Also, the explanation that starts with the below should be added to the .rx docs, if it isn't already.

The only thing better would a visual overview / diagram of how .rx works. Panel has way too few of those.

How does pure Python support all this? rdf is a reactive expression (rx), 
in this case a proxy for df. Each reactive expression r:

...

Coderambling avatar Aug 04 '24 18:08 Coderambling

pn.widget returns the actual object created, so you can already simply look at its repr:

>>> pn.widget(5, name="bob")
IntSlider(end=15, name='bob', start=-5, value=5)

The values listed in the extension are not required. throttled=True by default, which I think is correct, because that's the safest choice when you have no information about how expensive it is to update the output. In my case I know it is cheap to update, so I prefer having it be fully responsive with throttled=False, but by default I think it should stay True.

I don't know what the default should be for template, but whatever it is peoples' preferences will change over time, so people will often want to specify that. But they shouldn't need to for notebook work. So just pn.extension() should be enough. Maybe we can print a message if people haven't called that and we're in a notebook context, to give people one less thing to remember?

jbednar avatar Aug 11 '24 16:08 jbednar

If #7033 is finished and merged, what would be left to close this issue is to add type guessing. E.g. add a lookup table like this totally untested and sketchy code:

scalars = {'int64': pn.IntSlider, ...}
ranges = {'int64': pn.IntRangeSlider, ...}

def widget (value, ..., range=True, type_=None, **params):
   t = type_ if type_ is not None else (type or dtype of the collection)
   wt = ranges[t] if range else scalars[t]
   widget = wt.from_values(c, **params)
   return widget

jbednar avatar Aug 14 '24 21:08 jbednar