pantab icon indicating copy to clipboard operation
pantab copied to clipboard

ENH: Add way to write DATE types to Hyper

Open mhadi813 opened this issue 5 years ago • 8 comments

I'm trying to write a dataframe that contains datetime.date object to hyper using pantab.frame_to_hyper method and it raisers TypeError.

Steps to reproduce the problem:

import pandas as pd import datetime date = datetime.date(2020,5,8) df = pd.DataFrame({'Date': [date,date,date], 'Col' : list('ABC') }) df.head() df.info() import pantab from tableauhyperapi import TableName table = TableName('Extract','Extract') pantab.frame_to_hyper(df, 'random_db.hyper', table=table)

=> TypeError: Invalid value "datetime.date(2020, 5, 8)" found (row 0 column 0)

converting datetime.date to pd.datetime solves the problem df.iloc[0,0] df['Date'] = pd.to_datetime(df['Date']) pantab.frame_to_hyper(df, 'random_db.hyper', table=table)

other info: OS: macOS Catalina 10.15.3 pandas version 1.0.0 pantab version 1.1.0

Thanks

Hadi

mhadi813 avatar May 07 '20 19:05 mhadi813

Thanks for the report. This is “by design” in today’s world because there isnt a first class dtype in pandas for dates.

Your workaround is the suggested approach, though if you really want date and not date time in the extract it falls short. I think could use a keyword argument that allows you to explicitly store date time dtypes as dates - interested in trying a PR for that?

WillAyd avatar May 07 '20 20:05 WillAyd

Thanks will, i'll make a PR.

mhadi813 avatar May 10 '20 04:05 mhadi813

I'm trying to make a PR for kwargs for casting datetime.date to pd.datetime. Can you grant me permission? Thanks

def frame_to_hyper( df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: pantab_types.TableType, table_mode: str = "w", **kwargs: Union[str, list] ) -> None: """See api.rst for documentation""" if 'date_column' in kwargs: date_column = kwargs.get('date_column') if isinstance(date_column, list): for col in date_column: df[col] = pd.to_datetime(df[col]) elif isinstance(date_column, str): df[date_column] = pd.to_datetime(df[date_column])

mhadi813 avatar May 10 '20 05:05 mhadi813

You shouldn’t need any extra access. Make sure you fork the repo then push the branch to your fork, then make a pull request from there.

The instructions in the contributing guide should help so make sure to give that a look. Ping if you get stuck again.

Thanks!

Get Outlook for iOShttps://aka.ms/o0ukef


From: Hadi [email protected] Sent: Saturday, May 9, 2020 10:57:39 PM To: innobi/pantab [email protected] Cc: will_ayd [email protected]; Comment [email protected] Subject: Re: [innobi/pantab] ENH: Add way to write DATE types to Hyper (#100)

I'm trying to make a PR for kwargs for casting datetime.date to pd.datetime. Can you grant me permission? Thanks

def frame_to_hyper( df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: pantab_types.TableType, table_mode: str = "w", **kwargs: Union[str, list] ) -> None: """See api.rst for documentation""" if 'date_column' in kwargs: date_column = kwargs.get('date_column') if isinstance(date_column, list): for col in date_column: df[col] = pd.to_datetime(df[col]) elif isinstance(date_column, str): df[date_column] = pd.to_datetime(df[date_column])

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/innobi/pantab/issues/100#issuecomment-626278271, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEU4UMDIGFY56S4UBHX4S3RQY65HANCNFSM4M3S2LZA.

WillAyd avatar May 10 '20 20:05 WillAyd

So there is a discussion of adding this as a type upstream in pandas:

https://github.com/pandas-dev/pandas/issues/32473

I think any work we do here would have to wait on that, so let's see if that gets traction

WillAyd avatar Jun 18 '20 17:06 WillAyd

The date field seems to have stalled in pandas, can this be considered again?

We have a fair few dates in our project, and would love to use pantab for this.

joshuataylor avatar Apr 11 '22 03:04 joshuataylor

@joshuataylor have you looked at hyperarrow? It is a similar tool but with arrow as a back end you get first class DATE support

https://hyperarrow.readthedocs.io/en/latest/

WillAyd avatar Apr 11 '22 03:04 WillAyd

I didn't know that library existed, awesome work :heart_eyes: . Will give it a go.

joshuataylor avatar Apr 11 '22 03:04 joshuataylor

Is this still open? Running into this issue right now using pandas.

TypeError: Invalid value "datetime.date(2023, 10, 5)" found (row 0 column 5)

jstrauss18 avatar Oct 23 '23 19:10 jstrauss18

@jstrauss18 your column dtype is likely object. If you want to write time stamps make sure you use a datetime dtype column. Pandas does not natively support plain DATE types (pyarrow does, but pantab currently does not leverage pyarrow types)

WillAyd avatar Oct 23 '23 19:10 WillAyd

Not sure what to do. I'm using databricks delta sharing to load data frame and I don't name the columns.

df

jstrauss18 avatar Oct 23 '23 19:10 jstrauss18

Sorry I'm not familiar with databricks so can't give specific advice. You might want to try StackOverflow for something more tailored. Most I/O methods in pandas provide a parse_dates= argument that you can use when inferencing is not correct, although there may be something more foundational to be fixed with your code

As a hack you could try df.iloc[:, 5] = pd.to_datetime(df.iloc[:, 5]) since the traceback says its the fifth column where you are having an issue. But beyond that I would try StackOverflow or a Databricks support forum

WillAyd avatar Oct 23 '23 19:10 WillAyd

Hello, I am facing the same problem. I moved my application from using native tableau server API to convert my CSV files to hyper in order gain more performances in terms of conversion time. However, my dataset contains DATE format. The dates are converted ton datetime. DATE format is needed. Any news about this issue ? @WillAyd

mohamedhamnache avatar Jan 09 '24 10:01 mohamedhamnache

Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base

WillAyd avatar Jan 09 '24 12:01 WillAyd

Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base

Any idea about the release date and who is handling this

mohamedhamnache avatar Jan 11 '24 13:01 mohamedhamnache

I am maintaining a checklist of things in https://github.com/innobi/pantab/issues/219 - feel free to comment there or ask questions.

As far as a release date...I do not know. I am looking at using some new technology so there are many variables at play. This being an open source project things get developed as myself or anyone in the community has time and interest, which also adds another layer. The best thing I can say is "maybe a couple of months" but without any guarantee :-)

WillAyd avatar Jan 11 '24 19:01 WillAyd