spark-redshift icon indicating copy to clipboard operation
spark-redshift copied to clipboard

Dataframe save gives exception for nulltypes

Open ibnipun10 opened this issue 8 years ago • 5 comments

When i try to write a null column value to a redshift table dataframe.save throws an exception as unexpected datatype nulltype

http://stackoverflow.com/q/35966006/110449:

I am working with pyspark and saving dataframe to redshift. I am getting the below error when trying to save it to

": java.lang.UnsupportedOperationException: Unexpected type NullType. at com.databricks.spark.avro.SchemaConverters$.com$databricks$spark$avro$SchemaConverters$$convertFieldTypeToAvro(SchemaConverters.scala:283) "

When I look athte source code for ScehmaCovnerters : https://github.com/databricks/spark-avro/blob/master/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

I can see no support for NullType. I have columns in dataframe which are null.

What is the solution to this?

ibnipun10 avatar Mar 15 '16 00:03 ibnipun10

Any update on this issue ?

nadirvardar avatar Apr 22 '16 18:04 nadirvardar

The type NullType usually occurs as the type of null literals; if you have a column of some other type which happens to contain nulls then you'll have a nullable field of that type (e.g. a nullable IntType field). The problem with NullType is that we don't know which SQL type it should map to and, as a result, do not know which type to assign to the column in the CREATE TABLE statement.

Given this limitation, I don't think that you'll be able to create a new table if your schema contains a field with NullType. However, I think that you probably should be able to append to an existing table.

Therefore, I think there are a few things we could fix here:

  • Give a more informative error message when trying to create a new table (or completely overwrite an existing one) if the DataFrame's schema contains a field whose type is NullType.
  • When appending to an existing table, use the existing table's schema to replace the NullType by the type retrieved from the existing table.
  • Open tickets against Spark in case we find any cases where NullType is being inappropriately inferred / used in schemas.

JoshRosen avatar Apr 23 '16 01:04 JoshRosen

I'm trying to append to existing table and I'm still getting this error

farshidz avatar Apr 11 '18 02:04 farshidz

@farshidz same here

smats0 avatar Sep 11 '18 22:09 smats0

@ibnipun10 If you cast the null column to a spark sql supported type. It solves the issue. Example: lit(null).cast(DoubleType)) in scala and lit(None).cast(DoubleType()) in python

meetchandan avatar Feb 24 '19 08:02 meetchandan