holoviews icon indicating copy to clipboard operation
holoviews copied to clipboard

Passing DataFrame with int columns to Dataset

Open hoxbro opened this issue 1 year ago • 2 comments

Problem

When supplying a DataFrame to hv.Dataset with integer as column name, the Dataset.data returns an empty OrderedDict. This was unexpected for me and took me some time to figure out. Everything works as expected if I change the column's names to strings.

import holoviews as hv
import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.randn(10, 3))
# df.columns = list("ABC")  # <--- Need to have a column names as string
ds = hv.Dataset(df)

print(ds.data)
pd.testing.assert_frame_equal(df, ds.data)

Output with int columns

OrderedDict()
Traceback (most recent call last):
  File "/home/shh/Development/holoviz/empty_dataset.py", line 11, in <module>
    pd.testing.assert_frame_equal(df, ds.data)
  File "/home/shh/miniconda3/envs/holoviz/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 1264, in assert_frame_equal
    _check_isinstance(left, right, DataFrame)
  File "/home/shh/miniconda3/envs/holoviz/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 241, in _check_isinstance
    raise AssertionError(
AssertionError: DataFrame Expected type <class 'pandas.core.frame.DataFrame'>, found <class 'collections.OrderedDict'> instead

Output with str columns

          A         B         C
0 -1.446167  0.755111 -1.630022
1  2.072806  0.762481  1.000302
2 -0.125631  1.314712 -0.091937
3 -0.752042  2.065006 -0.094255
4  0.864158 -2.002643 -0.244065
5  0.692049  0.517888 -0.398295
6 -0.036477  0.442650 -0.798847
7 -0.719853 -0.473627  2.235159
8  0.971089 -1.273741  0.750291
9  0.729594 -0.656605 -0.057366

First observed

The behavior was first observed with hvplot.interactive. An example of the behavior is given here (ignore the warning as these screenshots is just to show the problem):

With int columns With str columns
image image

Culprit

I looked at the code and found the problem happens because dimension_name raises a ValueError and, therefore, is never able to raise the DataError in the following lines.

https://github.com/holoviz/holoviews/blob/2d2b72809eb2e2d75f1e26f3fad97892b2d6a720/holoviews/core/data/pandas.py#L64-L74

Because a ValueError is raised and not a DataError, the for loop continues and chooses an OrderedDict:

https://github.com/holoviz/holoviews/blob/2d2b72809eb2e2d75f1e26f3fad97892b2d6a720/holoviews/core/data/interface.py#L249-L260

Possible solution

So my initial solution was to move the DataError error above the for-loop. With this change, the output of the original example raises the DataError, which gives an informative error message about what is needed to get the code to work.

Traceback (most recent call last):
  File "/home/shh/Development/holoviz/empty_dataset.py", line 7, in <module>
    ds = hv.Dataset(df)
  File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/__init__.py", line 338, in __init__
    initialized = Interface.initialize(type(self), data, kdims, vdims,
  File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/interface.py", line 254, in initialize
    (data, dims, extra_kws) = interface.init(eltype, data, kdims, vdims)
  File "/home/shh/Development/holoviz/holoviews/holoviews/core/data/pandas.py", line 65, in init
    raise DataError("pandas DataFrame column names used as dimensions "
holoviews.core.data.interface.DataError: pandas DataFrame column names used as dimensions must be strings not integers.

PandasInterface expects tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html

Problem with the solution

I ran the tests for HoloViews locally, and all tests passed.

The problem is that when I ran the tests for hvplot, I got a lot of errors related to the change, which indicates the current behavior is embedded in the logic of hvplot and could give backward compatibility problems if the solution is implemented.

FAILED hvplot/tests/testcharts.py::TestChart1D::test_by_datetime_accessor - holoviews.core.data.interface.DataError: pandas DataFrame colu...
FAILED hvplot/tests/testoperations.py::TestDatashader::test_xlim_affects_x_range - holoviews.core.data.interface.DataError: pandas DataFra...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_0_area - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_1_bar - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_2_barh - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_3_box - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_4_density - holoview...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_5_hist - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_6_kde - holoviews.co...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_dataframe_plot_returns_holoviews_object_7_line - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_0_area - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_1_bar - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_2_barh - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_3_box - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_4_density - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_5_hist - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_6_kde - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHoloviewsPlotting::test_pandas_series_plot_returns_holoviews_object_7_line - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_0_area - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_1_bar - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_2_barh - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_3_box - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_4_density - holoviews.c...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_5_hist - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_6_kde - holoviews.core....
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_dataframe_plot_returns_holoviews_object_7_line - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_0_area - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_1_bar - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_2_barh - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_3_box - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_4_density - holoviews.core...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_5_hist - holoviews.core.da...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_6_kde - holoviews.core.dat...
FAILED hvplot/tests/testplotting.py::TestPandasHvplotPlotting::test_pandas_series_plot_returns_holoviews_object_7_line - holoviews.core.da...

hoxbro avatar Jul 12 '22 08:07 hoxbro

Discussing with @Hoxbro , I agree that this suggested fix is the correct thing to do at the HoloViews level which means those test failures would need to be addressed separately at the hvplot level.

@Hoxbro A PR with your suggested fix would be very welcome!

jlstevens avatar Jul 13 '22 10:07 jlstevens

I will reopen this issue as the merged PR is being reverted in #5457.

It is still the plan to implement this.

hoxbro avatar Sep 28 '22 10:09 hoxbro

This was added back in https://github.com/holoviz/holoviews/pull/5654

hoxbro avatar Apr 07 '23 06:04 hoxbro