hvplot
hvplot copied to clipboard
.interactive dataframe slice selection by row index or position not possible
Is your feature request related to a problem? Please describe.
I was in the assumption that the .interactive interface for dataframes should mimic the interface of pandas. Yet not everything seems to work.
Given the code
import hvplot.pandas
import panel.widgets as pnw
from bokeh.sampledata import airports
df = airports.data
dfi = df.interactive()
p = pnw.IntSlider(start=10, end=40)
dfi[5:p]
I would like to see a slider and a dynamic dataframe representation showing index 5...p-1 (that dataframe has an integer index). Instead I get a static output without slider. I can make one in another cell, but when I move it, the dataframe output does not update. I need to execute the cell manually again to get the update.
This here doesn't work at all and gives an InvalidIndexError
dfi.loc[5:p,:]
# InvalidIndexError: IntSlider(end=40, start=10, value=10)
Those here give no slider and don't update when I create a slider in another cell:
dfi.loc[:p]
dfi.iloc[5:p]
dfi.iloc[slice(5,p)]
dfi.values[5:p]
This here gives a type error
dfi.loc[dfi.index[5:p]]
# TypeError: slice indices must be integers or None or have an __index__ method
A current workaround for dfi.loc[5:p,:]
is:
dfi[(dfi.index>=5) & (dfi.index<p)]
For dfi.iloc[5:p]
I have no reasonable workaround (except for resetting the index, selecting via above method and setting it again).
Neither dfi.values[5:p]
nor dfi.loc[dfi.index[5:p]]
as a workaround work, and I don't know what else to try.
Literally the first example in the interactive tutorial (https://hvplot.holoviz.org/user_guide/Interactive.html) has indexing by position in an xarray with .isel
, which I think i have tried and it works. So it is surprising to me that there is no mechanism to select a positional range in dataframes.
Here are my package versions:
hvplot : 0.8.0
pandas : 1.4.3
holoviews : 1.14.9
bokeh : 2.4.3
Python version : 3.9.13
IPython version : 8.4.0
jupyter notebook : 6.4.12
jupyterlab : 3.4.3
OS : Darwin
Release : 20.6.0
Browser : Safari
For .interactive to work with widgets embedded inside other objects passed into the dataframe methods, we have to do special work to create proxy objects that fetch current values from the widget before invoking the underlying object. See e.g. https://github.com/holoviz/holoviews/pull/5184#issuecomment-1019592627 , where we were discussing lists and dicts that get passed in. Here, we might need special support for a slice object?
Yes, I suppose the mechanism by which this works in Pandas is that a slice object is passed to the dataframe method. Makes sense. Is there currently a workaround for positional indexing?
Here is somewhat of a workaround. Unfortunately that only works when you start out with a dataframe, not with an interactive dataframe. So essentially if you know that you need positional indexing in your interactive dataframe, you have to do it right when creating it from the normal dataframe if you use this method. Passing an interactive dataframe into this mechanism (like with dfi
for the df
parameter) doesn't work.
end_ind = pnw.IntSlider(start=10, end=40)
def make_slice(df=df, end_ind=20):
return df.iloc[5:end_ind]
dfi = hvplot.bind(make_slice, df=df, end_ind=end_ind).interactive(width=600)
dfi # works as expected with dynamic slider
Yep, that's a good workaround for now. I think having it "just work" is a good feature request. @Hoxbro, something you could add?
If anyone knows of any other special objects like this, it would be good to address those as well...
I got it to work with slices, and will submit PRs in holoviews and hvplot today.
https://user-images.githubusercontent.com/19758978/177975269-75bdf24f-7fee-4f08-a17d-906b0a5cf9d6.mp4
Wow, you guys are amazing. A solution in less than 24 hours?!
Nice! @JanHomann , a really valuable contribution you (or other users) could make is to study the Pandas API and see if there are any other collections or special objects that would need similar treatment. I think we currently handle slices, dicts, and lists; not sure if there special iterators or other objects that we should be looking out for...
Nice! @JanHomann , a really valuable contribution you (or other users) could make is to study the Pandas API and see if there are any other collections or special objects that would need similar treatment. I think we current now handle slices, dicts, and lists; not sure if there special iterators or other objects that we should be looking out for...
Thank you! Another thing I have noticed is that dataframe functions don't give back the correct type. For example:
p = pnw.IndexSlider()
dfi.columns(p)
doesn't return a string (the column name). Instead it returns another interactive object, that cannot be used for example for indexing. So:
dfi[dfi.columns[p]]
doesn't work because it gets the wrong type.
Here is another problem. The .query()
method isn't working . This function can be replicated by boolean indexing, but it's more concise and fast.
import hvplot.pandas
import panel.widgets as pnw
from bokeh.sampledata import antibiotics
df = antibiotics.data
dfi = df.interactive()
p = pnw.IntSlider(value=5, start=1, end=10)
dfi.query('penicillin < @p') # TypeError (should return all the rows where the column `penicillin` is < 5)
In this case the widget information needs to be embedded in a string object I suppose.
f-strings are another thing.
This example doesn't work:
dfi.query(f'penicillin < {p}') # TypeError
f-string embedding would be great, because people do stuff like this for column indexing:
dfi[f'col_{p}'] # doesn't work
And then there is this: Assignment fails.
Overall I totally love the holoviz stack. I think it's currently the most advanced and most user friendly stack for interactive plotting in python.
Today I found some more cases where .interactive
fails. The first case is lambda functions (and probably also normal functions) that some Pandas methods accept as arguments.
import numpy as np
import pandas as pd
import hvplot.pandas
import panel.widgets as pnw
df = pd.DataFrame(data=np.random.randn(10,3))
p = pnw.IntSlider(start=1, end=10, value=5)
dfi = df.interactive()
df.apply(lambda x: x*5) # this works
dfi.apply(lambda x: x*p) # this doesn't
The second case is range objects, which can also be passed to some Pandas methods.
import numpy as np
import pandas as pd
import hvplot.pandas
import panel.widgets as pnw
df = pd.DataFrame(data=np.random.randn(10,3))
p = pnw.IntSlider(start=0, end=9, value=5)
dfi = df.interactive()
df.isin(range(1,3)) # this works (returns a boolean dataframe)
dfi.isin(range(1,p)) # this doesn't
# TypeError: 'IntSlider' object cannot be interpreted as an integer
I think many Pandas methods that natively support lists probably also support range objects.
Generator objects can also be passed to some Pandas methods (probably again many Pandas methods that take a list).
import numpy as np
import pandas as pd
import hvplot.pandas
import panel.widgets as pnw
df = pd.DataFrame(data=np.random.randn(20,3))
dfi = df.interactive()
p = pnw.IntSlider(start=1, end=4, value=2)
g = (n**2 for n in range(5))
g1 = (n**2 + p for n in range(5)) # panel widget IntSlider is embedded in a generator
df.loc[g,:] # this works
dfi.loc[g1,:] # this one doesn't:
# TypeError: unsupported operand type(s) for +: 'int' and 'IntSlider'
So there are:
- f-strings
- normal strings with @parameter for .query and .eval
- range objects
- (lambda) functions
- generators
And then there is the problem of passing the output of an interactive method back into an interactive dataframe. That is a reasonable thing to do with dataframes, for example in the case of df[df.columns[p]]
.
Currently it seems the output of an interactive dataframe is another interactive dataframe, but many dataframe methods don't return a dataframe, but a string or a tuple or a Series, all of which then can be used as a parameter for another dataframe method, for example when doing filtering or grouping.
Another thing that I just checked that doesn't work, is having a slider in a pd.Timestamp
object, which can also be something that is understood by Pandas as input for a computation.
from bokeh.sampledata import daylight
df = daylight.daylight_warsaw_2013
dfi = df.interactive()
p = pnw.IntSlider(start=1, end=28, value=10)
df[df.Date < pd.Timestamp(year=2013, month=6, day=5)] # This works
dfi[dfi.Date < pd.Timestamp(year=2013, month=6, day=p)] # this one doesn't
#TypeError: an integer is required (got type IntSlider)
Probably a slider in a pd.TimeDelta
object wouldn't work either, but I haven't checked that.
A few Pandas methods also can work with an pd.Interval
object as an input. For example, if you have an pd.IntervalIndex
you can filter it with an pd.Interval
object. Pandas has rather mediocre support for intervals, but for example pd.cut()
and pd.qcut()
(used for binning continuous data) return intervals for each binned piece of data that then can be further processed with .groupby()
to get statistics on those bins. .groupby()
can operate on intervals.
df = pd.DataFrame(data=np.random.randn(6,3),
index=pd.IntervalIndex.from_tuples( [ (-1.5,1), (-1,-0.5), (-0.5,0), (0,0.5), (0.5,1), (1,1.5) ] ))
dfi = df.interactive()
p = pnw.IntSlider(start=1, end=3, value=2)
df.index.overlaps(pd.Interval(0.2,1)) # this one works (returns a 1d boolean array)
dfi.index.overlaps(pd.Interval(0.2,p)) # this one doesn't
# ValueError: Only numeric, Timestamp and Timedelta endpoints are allowed when constructing an Interval.
If the index of df
is an pd.IntervalIndex
, then this here works for dataframes:
df.index.isin([5]) # returns a boolean array with elements true where 5 overlaps with the intervals in the index
dfi.index.isin([p]) # this actually works. returns a working slider and a boolean array in the same way as the static version
dfi.loc[p,:] # this here works too, nice. It returns the row where the interval index overlaps with the value of the slider p.
This here fails.
df.index.contains(1) # returns a 1d boolean array
dfi.index.contains(p) # does not work
# AttributeError: 'Interactive' object has no attribute 'contains'
Plotting with an interval index also fails.
df.plot() # works with an interval index
df.hvplot() # type error (this is a normal dataframe and not an interactive one)
# TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
So in summary, currently unsupported as parameters are:
- f-strings
- embedding variables with @ for .query and .eval
- range objects
- (lambda) functions
- generators
- pd.Timestamp
- pd.TimeDelta
- pd.Interval
- item assignment
- feeding the output of an interactive dataframe method into another interactive dataframe method
Great work @JanHomann! I will look into what is possible to add to interactive.
@Hoxbro The original problem seems solved now. So should this stay open? Or should the title be renamed, because we figured that there are other DataFrame parameter types that are currently not supported by interactive dataframes?
Let's keep this open with a title rename.
We have begun rewriting .interactive
to be more generic. When that is in place, we will revisit the suggestion made in this thread.