spark-redshift
spark-redshift copied to clipboard
Dataframe save gives exception for nulltypes
When i try to write a null column value to a redshift table dataframe.save throws an exception as unexpected datatype nulltype
http://stackoverflow.com/q/35966006/110449:
I am working with pyspark and saving dataframe to redshift. I am getting the below error when trying to save it to
": java.lang.UnsupportedOperationException: Unexpected type NullType. at com.databricks.spark.avro.SchemaConverters$.com$databricks$spark$avro$SchemaConverters$$convertFieldTypeToAvro(SchemaConverters.scala:283) "
When I look athte source code for ScehmaCovnerters : https://github.com/databricks/spark-avro/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala
I can see no support for NullType. I have columns in dataframe which are null.
What is the solution to this?
Any update on this issue ?
The type NullType
usually occurs as the type of null
literals; if you have a column of some other type which happens to contain nulls then you'll have a nullable field of that type (e.g. a nullable IntType
field). The problem with NullType
is that we don't know which SQL type it should map to and, as a result, do not know which type to assign to the column in the CREATE TABLE
statement.
Given this limitation, I don't think that you'll be able to create a new table if your schema contains a field with NullType
. However, I think that you probably should be able to append to an existing table.
Therefore, I think there are a few things we could fix here:
- Give a more informative error message when trying to create a new table (or completely overwrite an existing one) if the DataFrame's schema contains a field whose type is
NullType
. - When appending to an existing table, use the existing table's schema to replace the
NullType
by the type retrieved from the existing table. - Open tickets against Spark in case we find any cases where
NullType
is being inappropriately inferred / used in schemas.
I'm trying to append to existing table and I'm still getting this error
@farshidz same here
@ibnipun10 If you cast the null column to a spark sql supported type. It solves the issue.
Example:
lit(null).cast(DoubleType))
in scala
and
lit(None).cast(DoubleType())
in python