holoviews
holoviews copied to clipboard
Passing DataFrame with int columns to Dataset
Problem
When supplying a DataFrame to hv.Dataset
with integer as column name, the Dataset.data
returns an empty OrderedDict
. This was unexpected for me and took me some time to figure out. Everything works as expected if I change the column's names to strings.
import holoviews as hv
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randn(10, 3))
# df.columns = list("ABC") # <--- Need to have a column names as string
ds = hv.Dataset(df)
print(ds.data)
pd.testing.assert_frame_equal(df, ds.data)
Output with int
columns
OrderedDict()
Traceback (most recent call last):
File "/home/shh/Development/holoviz/empty_dataset.py", line 11, in <module>
pd.testing.assert_frame_equal(df, ds.data)
File "/home/shh/miniconda3/envs/holoviz/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 1264, in assert_frame_equal
_check_isinstance(left, right, DataFrame)
File "/home/shh/miniconda3/envs/holoviz/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 241, in _check_isinstance
raise AssertionError(
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'collections.OrderedDict'> instead
Output with str
columns
A B C
0 -1.446167 0.755111 -1.630022
1 2.072806 0.762481 1.000302
2 -0.125631 1.314712 -0.091937
3 -0.752042 2.065006 -0.094255
4 0.864158 -2.002643 -0.244065
5 0.692049 0.517888 -0.398295
6 -0.036477 0.442650 -0.798847
7 -0.719853 -0.473627 2.235159
8 0.971089 -1.273741 0.750291
9 0.729594 -0.656605 -0.057366
First observed
The behavior was first observed with hvplot.interactive
. An example of the behavior is given here (ignore the warning as these screenshots is just to show the problem):
With int columns |
With str columns |
---|---|
![]() |
![]() |
Culprit
I looked at the code and found the problem happens because dimension_name
raises a ValueError
and, therefore, is never able to raise the DataError
in the following lines.
https://github.com/holoviz/holoviews/blob/2d2b72809eb2e2d75f1e26f3fad97892b2d6a720/holoviews/core/data/pandas.py#L64-L74
Because a ValueError
is raised and not a DataError
, the for loop continues and chooses an OrderedDict
:
https://github.com/holoviz/holoviews/blob/2d2b72809eb2e2d75f1e26f3fad97892b2d6a720/holoviews/core/data/interface.py#L249-L260
Possible solution
So my initial solution was to move the DataError
error above the for-loop. With this change, the output of the original example raises the DataError
, which gives an informative error message about what is needed to get the code to work.
Traceback (most recent call last):
File "/home/shh/Development/holoviz/empty_dataset.py", line 7, in <module>
ds = hv.Dataset(df)
File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/__init__.py", line 338, in __init__
initialized = Interface.initialize(type(self), data, kdims, vdims,
File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/interface.py", line 254, in initialize
(data, dims, extra_kws) = interface.init(eltype, data, kdims, vdims)
File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/pandas.py", line 65, in init
raise DataError("pandas DataFrame column names used as dimensions "
holoviews.core.data.interface.DataError: pandas DataFrame column names used as dimensions must be strings not integers.
PandasInterface expects tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html
Problem with the solution
I ran the tests for HoloViews locally, and all tests passed.
The problem is that when I ran the tests for hvplot, I got a lot of errors related to the change, which indicates the current behavior is embedded in the logic of hvplot and could give backward compatibility problems if the solution is implemented.
FAILED hvplot/tests/testcharts.py::TestChart1D::test_by_datetime_accessor - holoviews.core.data.interface.DataError: pandas DataFrame colu...
FAILED hvplot/tests/testoperations.py::TestDatashader::test_xlim_affects_x_range - holoviews.core.data.interface.DataError: pandas DataFra...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_0_area - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_1_bar - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_2_barh - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_3_box - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_4_density - holoview...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_5_hist - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_6_kde - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_7_line - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_0_area - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_1_bar - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_2_barh - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_3_box - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_4_density - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_5_hist - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_6_kde - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_7_line - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_0_area - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_1_bar - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_2_barh - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_3_box - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_4_density - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_5_hist - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_6_kde - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_7_line - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_0_area - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_1_bar - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_2_barh - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_3_box - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_4_density - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_5_hist - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_6_kde - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_7_line - holoviews.core.da...
Discussing with @Hoxbro , I agree that this suggested fix is the correct thing to do at the HoloViews level which means those test failures would need to be addressed separately at the hvplot level.
@Hoxbro A PR with your suggested fix would be very welcome!
I will reopen this issue as the merged PR is being reverted in #5457.
It is still the plan to implement this.
This was added back in https://github.com/holoviz/holoviews/pull/5654