pantab icon indicating copy to clipboard operation
pantab copied to clipboard

frames_from_hyper not pulling when there's no data in the table.

Open VDFaller opened this issue 3 years ago • 10 comments

Describe the bug I'm simply trying to pull all the frames from a hyper file (which I unfortunately can't give you). But I think the problem is that one of the tables doesn't have any rows. Which then errors out here with

ValueError: Length mismatch: Expected axis has 0 elements, new values have 60 elements

df in the line above is an empty dataframe.

This is a problem because it's screwing up my other reads.

Expected behavior It gives me a blank dataframe with the proper columns but no data.

Could skip it too, but that seems worse.

Desktop (please complete the following information):

  • OS: windows10

VDFaller avatar Jun 08 '22 17:06 VDFaller

I can fix it by adding

if df.empty:
    df = pd.DataFrame(columns=dtypes.keys())

right before it.

VDFaller avatar Jun 08 '22 17:06 VDFaller

Thanks for the note. If you’d like to create a pull request with a test it would be very welcome

WillAyd avatar Jun 08 '22 17:06 WillAyd

Already making the MR. I don't know how to use tableau so I don't know how to make a hyper , but I'll see if someone can help me over here.

VDFaller avatar Jun 08 '22 18:06 VDFaller

I think you can still use pantab to write an empty dataframe? Or is that not working either?

WillAyd avatar Jun 08 '22 18:06 WillAyd

Good point. I'll try that.

VDFaller avatar Jun 08 '22 18:06 VDFaller

Seems that frame_to_hyper also doesn't work.

import pantab
import pathlib
from tableauhyperapi import TableName

datapath = pathlib.Path(__file__).parent / "data"
db_path = datapath / "zero_row.hyper"
df_expected = pd.DataFrame(columns = ['A'])

pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row'))

This seems to be in libpantab, and I'm not comfortable enough to touch the C.

fails with
MemoryError                               Traceback (most recent call last)
f:\Work\Lumentum\Lumentum\pantab\pantab\tests\test_reader.py in <cell line: 5>()
      [129](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=128) db_path = datapath / "zero_row.hyper"
      [130](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=129) df_expected = pd.DataFrame(columns = ['A'])
----> [132](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=131) pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row'))

File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:175, in frame_to_hyper(df, database, table, table_mode, hyper_process)
    [166](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=165) def frame_to_hyper(
    [167](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=166)     df: pd.DataFrame,
    [168](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=167)     database: Union[str, pathlib.Path],
   (...)
    [172](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=171)     hyper_process: Optional[tab_api.HyperProcess] = None,
    [173](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=172) ) -> None:
    [174](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=173)     """See api.rst for documentation"""
--> [175](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=174)     frames_to_hyper({table: df}, database, table_mode, hyper_process=hyper_process)

File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:198, in frames_to_hyper(dict_of_frames, database, table_mode, hyper_process)
    [194](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=193) with tab_api.Connection(
    [195](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=194)     hpe.endpoint, tmp_db, tab_api.CreateMode.CREATE_IF_NOT_EXISTS
    [196](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=195) ) as connection:
    [197](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=196)     for table, df in dict_of_frames.items():
--> [198](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=197)         _insert_frame(
    [199](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=198)             df, connection=connection, table=table, table_mode=table_mode
    [200](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=199)         )
    [202](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=201) # In Python 3.9+ we can just pass the path object, but due to bpo 32689
    [203](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=202) # and subsequent typeshed changes it is easier to just pass as str for now
    [204](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=203) shutil.move(str(tmp_db), database)

File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:154, in _insert_frame(df, connection, table, table_mode)
    [152](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=151) with tab_api.Inserter(connection, table_def) as inserter:
    [153](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=152)     if compat.PANDAS_130:
--> [154](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=153)         libpantab.write_to_hyper(df, null_mask, inserter._buffer, dtypes)
    [155](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=154)     else:
    [156](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=155)         libpantab.write_to_hyper_legacy(
    [157](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=156)             df.itertuples(index=False, name=None),
    [158](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=157)             null_mask,
   (...)
    [161](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=160)             dtypes,
    [162](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=161)         )

MemoryError:

VDFaller avatar Jun 08 '22 19:06 VDFaller

That is unfortunate. Well if you are feeling ambitious here's a guide I wrote in pandas for how to debug their C extensions. The same rules should apply here in pantab

https://pandas.pydata.org/docs/development/debugging_extensions.html

WillAyd avatar Jun 08 '22 19:06 WillAyd

Alternately we could also just return early if the data frame is empty during write, not even invoking the tableau inserter.

Definitely an untested case here with reading/writing empty frames. Would make for a good scenario to put into test_roundtrip.py

WillAyd avatar Jun 08 '22 19:06 WillAyd

Don't know if I have that kind of ambition at the moment. But would it be helpful if I made a test_roundtrip.py test for it? maybe with a @pytest.mark.skip(reason="currently failing #163") ?

VDFaller avatar Jun 08 '22 19:06 VDFaller

I think the test there should work for reading/writing. I know the OP was just about reading but looks like neither work with an empty frame. Would make sense to fix togethee

WillAyd avatar Jun 08 '22 20:06 WillAyd