aws-sdk-pandas
aws-sdk-pandas copied to clipboard
We can not infer the data type from an entire null object column
Describe the bug
if a column is null there should be a fallback data type (varchar)
I'm using:
wr.redshift.copy_from_files(
path=path,
con=con,
table=file_name.replace(".parquet", ""),
schema="staging",
parquet_infer_sampling = 1,
varchar_lengths_default = 65535
)
How to Reproduce
*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.10.12
AWS SDK for pandas version
3.4.2
Additional context
No response
Hi @misteliy it looks like there is an entire column with nulls in the data so we fail to recognise the type of the column.
As a hotfix, you can identify the column that has an issue and provide the list of valid columns via column_names parameter. I will check if there is anything else we can do to fix this on our side.
Thanks 🙏 yes, that’s exactly what I have done 😊 wanted just raise this because it could maybe get more gracefully be handled
Hello. I have received the same error message, in a different context. And found this ticket while investigating the problem.
Just wanted to share my two cents: I would prefer not to have a fallback. We will, as I understand it, just delay a potential error until a later time. If fallback is a string, and the actual type is an int, which will arrive in a future file, we will just get a type mismatch at that point in time,
What puzzles me, is that it needs to derive the the type, at least in the case of Parquet Parquet files containing metadata, why not simply take the type from the metadata? Do the current approach relate to partitioning, since there are no metadata for those?
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.