amazon-redshift-python-driver icon indicating copy to clipboard operation
amazon-redshift-python-driver copied to clipboard

pandas None/NaN mappings

Open LuchiLucs opened this issue 9 months ago • 1 comments

I'm using write_dataframe function to write a pandas DataFrame. My context is that this dataframe containts columns of three different types:

  1. Object (string) in pandas has None as missing data
  2. Int64 in pandas has np.nan as missing data
  3. Floats64 in pandas has np.nan as missing data

When writing to Redshift, these values are converted as such:

  • None as NULL using varchar(20) with bytedict encoding
  • NaN as -9223372036854775808 using BIGINT with az64 encoding
  • NaN as "NaN" using DOUBLE PRECISION with RAW encoding

When I try to query using SQL, based on the column, I have to filter with:

  1. IS NULL
  2. = -9223372036854775808
  3. ::text = "NaN"

Is this intended? I wish to map all None/NaN values of pandas into NULL values. Is this possible?

LuchiLucs avatar Feb 04 '25 12:02 LuchiLucs

When reading back into pandas DataFrame, for instance, the int64 columns has missing data with -9223372036854775808 instead of np.nan, resulting in a NON reproducible mapping.

LuchiLucs avatar Feb 04 '25 13:02 LuchiLucs