ENH: Add way to write DATE types to Hyper
I'm trying to write a dataframe that contains datetime.date object to hyper using pantab.frame_to_hyper method and it raisers TypeError.
Steps to reproduce the problem:
import pandas as pd import datetime date = datetime.date(2020,5,8) df = pd.DataFrame({'Date': [date,date,date], 'Col' : list('ABC') }) df.head() df.info() import pantab from tableauhyperapi import TableName table = TableName('Extract','Extract') pantab.frame_to_hyper(df, 'random_db.hyper', table=table)
=> TypeError: Invalid value "datetime.date(2020, 5, 8)" found (row 0 column 0)
converting datetime.date to pd.datetime solves the problem df.iloc[0,0] df['Date'] = pd.to_datetime(df['Date']) pantab.frame_to_hyper(df, 'random_db.hyper', table=table)
other info: OS: macOS Catalina 10.15.3 pandas version 1.0.0 pantab version 1.1.0
Thanks
Hadi
Thanks for the report. This is “by design” in today’s world because there isnt a first class dtype in pandas for dates.
Your workaround is the suggested approach, though if you really want date and not date time in the extract it falls short. I think could use a keyword argument that allows you to explicitly store date time dtypes as dates - interested in trying a PR for that?
Thanks will, i'll make a PR.
I'm trying to make a PR for kwargs for casting datetime.date to pd.datetime. Can you grant me permission? Thanks
def frame_to_hyper( df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: pantab_types.TableType, table_mode: str = "w", **kwargs: Union[str, list] ) -> None: """See api.rst for documentation""" if 'date_column' in kwargs: date_column = kwargs.get('date_column') if isinstance(date_column, list): for col in date_column: df[col] = pd.to_datetime(df[col]) elif isinstance(date_column, str): df[date_column] = pd.to_datetime(df[date_column])
You shouldn’t need any extra access. Make sure you fork the repo then push the branch to your fork, then make a pull request from there.
The instructions in the contributing guide should help so make sure to give that a look. Ping if you get stuck again.
Thanks!
Get Outlook for iOShttps://aka.ms/o0ukef
From: Hadi [email protected] Sent: Saturday, May 9, 2020 10:57:39 PM To: innobi/pantab [email protected] Cc: will_ayd [email protected]; Comment [email protected] Subject: Re: [innobi/pantab] ENH: Add way to write DATE types to Hyper (#100)
I'm trying to make a PR for kwargs for casting datetime.date to pd.datetime. Can you grant me permission? Thanks
def frame_to_hyper( df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: pantab_types.TableType, table_mode: str = "w", **kwargs: Union[str, list] ) -> None: """See api.rst for documentation""" if 'date_column' in kwargs: date_column = kwargs.get('date_column') if isinstance(date_column, list): for col in date_column: df[col] = pd.to_datetime(df[col]) elif isinstance(date_column, str): df[date_column] = pd.to_datetime(df[date_column])
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/innobi/pantab/issues/100#issuecomment-626278271, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEU4UMDIGFY56S4UBHX4S3RQY65HANCNFSM4M3S2LZA.
So there is a discussion of adding this as a type upstream in pandas:
https://github.com/pandas-dev/pandas/issues/32473
I think any work we do here would have to wait on that, so let's see if that gets traction
The date field seems to have stalled in pandas, can this be considered again?
We have a fair few dates in our project, and would love to use pantab for this.
@joshuataylor have you looked at hyperarrow? It is a similar tool but with arrow as a back end you get first class DATE support
https://hyperarrow.readthedocs.io/en/latest/
I didn't know that library existed, awesome work :heart_eyes: . Will give it a go.
Is this still open? Running into this issue right now using pandas.
TypeError: Invalid value "datetime.date(2023, 10, 5)" found (row 0 column 5)
@jstrauss18 your column dtype is likely object. If you want to write time stamps make sure you use a datetime dtype column. Pandas does not natively support plain DATE types (pyarrow does, but pantab currently does not leverage pyarrow types)
Not sure what to do. I'm using databricks delta sharing to load data frame and I don't name the columns.
Sorry I'm not familiar with databricks so can't give specific advice. You might want to try StackOverflow for something more tailored. Most I/O methods in pandas provide a parse_dates= argument that you can use when inferencing is not correct, although there may be something more foundational to be fixed with your code
As a hack you could try df.iloc[:, 5] = pd.to_datetime(df.iloc[:, 5]) since the traceback says its the fifth column where you are having an issue. But beyond that I would try StackOverflow or a Databricks support forum
Hello, I am facing the same problem. I moved my application from using native tableau server API to convert my CSV files to hyper in order gain more performances in terms of conversion time. However, my dataset contains DATE format. The dates are converted ton datetime. DATE format is needed. Any news about this issue ? @WillAyd
Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base
Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base
Any idea about the release date and who is handling this
I am maintaining a checklist of things in https://github.com/innobi/pantab/issues/219 - feel free to comment there or ask questions.
As far as a release date...I do not know. I am looking at using some new technology so there are many variables at play. This being an open source project things get developed as myself or anyone in the community has time and interest, which also adds another layer. The best thing I can say is "maybe a couple of months" but without any guarantee :-)