frames_from_hyper not pulling when there's no data in the table.
Describe the bug I'm simply trying to pull all the frames from a hyper file (which I unfortunately can't give you). But I think the problem is that one of the tables doesn't have any rows. Which then errors out here with
ValueError: Length mismatch: Expected axis has 0 elements, new values have 60 elements
df in the line above is an empty dataframe.
This is a problem because it's screwing up my other reads.
Expected behavior It gives me a blank dataframe with the proper columns but no data.
Could skip it too, but that seems worse.
Desktop (please complete the following information):
- OS: windows10
I can fix it by adding
if df.empty:
df = pd.DataFrame(columns=dtypes.keys())
right before it.
Thanks for the note. If you’d like to create a pull request with a test it would be very welcome
Already making the MR. I don't know how to use tableau so I don't know how to make a hyper , but I'll see if someone can help me over here.
I think you can still use pantab to write an empty dataframe? Or is that not working either?
Good point. I'll try that.
Seems that frame_to_hyper also doesn't work.
import pantab
import pathlib
from tableauhyperapi import TableName
datapath = pathlib.Path(__file__).parent / "data"
db_path = datapath / "zero_row.hyper"
df_expected = pd.DataFrame(columns = ['A'])
pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row'))
This seems to be in libpantab, and I'm not comfortable enough to touch the C.
fails with
MemoryError Traceback (most recent call last)
f:\Work\Lumentum\Lumentum\pantab\pantab\tests\test_reader.py in <cell line: 5>()
[129](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=128) db_path = datapath / "zero_row.hyper"
[130](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=129) df_expected = pd.DataFrame(columns = ['A'])
----> [132](file:///f%3A/Work/Lumentum/Lumentum/pantab/pantab/tests/test_reader.py?line=131) pantab.frame_to_hyper(df_expected, db_path, table = TableName('not_the_public_schema', 'zero_row'))
File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:175, in frame_to_hyper(df, database, table, table_mode, hyper_process)
[166](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=165) def frame_to_hyper(
[167](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=166) df: pd.DataFrame,
[168](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=167) database: Union[str, pathlib.Path],
(...)
[172](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=171) hyper_process: Optional[tab_api.HyperProcess] = None,
[173](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=172) ) -> None:
[174](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=173) """See api.rst for documentation"""
--> [175](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=174) frames_to_hyper({table: df}, database, table_mode, hyper_process=hyper_process)
File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:198, in frames_to_hyper(dict_of_frames, database, table_mode, hyper_process)
[194](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=193) with tab_api.Connection(
[195](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=194) hpe.endpoint, tmp_db, tab_api.CreateMode.CREATE_IF_NOT_EXISTS
[196](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=195) ) as connection:
[197](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=196) for table, df in dict_of_frames.items():
--> [198](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=197) _insert_frame(
[199](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=198) df, connection=connection, table=table, table_mode=table_mode
[200](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=199) )
[202](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=201) # In Python 3.9+ we can just pass the path object, but due to bpo 32689
[203](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=202) # and subsequent typeshed changes it is easier to just pass as str for now
[204](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=203) shutil.move(str(tmp_db), database)
File c:\tools\Anaconda3\envs\test\lib\site-packages\pantab\_writer.py:154, in _insert_frame(df, connection, table, table_mode)
[152](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=151) with tab_api.Inserter(connection, table_def) as inserter:
[153](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=152) if compat.PANDAS_130:
--> [154](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=153) libpantab.write_to_hyper(df, null_mask, inserter._buffer, dtypes)
[155](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=154) else:
[156](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=155) libpantab.write_to_hyper_legacy(
[157](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=156) df.itertuples(index=False, name=None),
[158](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=157) null_mask,
(...)
[161](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=160) dtypes,
[162](file:///c%3A/tools/Anaconda3/envs/test/lib/site-packages/pantab/_writer.py?line=161) )
MemoryError:
That is unfortunate. Well if you are feeling ambitious here's a guide I wrote in pandas for how to debug their C extensions. The same rules should apply here in pantab
https://pandas.pydata.org/docs/development/debugging_extensions.html
Alternately we could also just return early if the data frame is empty during write, not even invoking the tableau inserter.
Definitely an untested case here with reading/writing empty frames. Would make for a good scenario to put into test_roundtrip.py
Don't know if I have that kind of ambition at the moment. But would it be helpful if I made a test_roundtrip.py test for it? maybe with a @pytest.mark.skip(reason="currently failing #163") ?
I think the test there should work for reading/writing. I know the OP was just about reading but looks like neither work with an empty frame. Would make sense to fix togethee