spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

java.lang.RuntimeException: scala.Some is not a valid external type for schema of string

Open tquaresma opened this issue 3 years ago • 2 comments

Steps to Reproduce (for bugs)

Excel File to load: image

Code to load file: %scala import org.apache.spark.sql._ import org.apache.spark.sql.types._

val myschema = StructType(Array( StructField("Processo", StringType, nullable = false), StructField("Data", StringType, nullable = false), StructField("Balcao", StringType, nullable = false), StructField("CriadoPor", StringType, nullable = false) ))

val df = spark.read .format("com.crealytics.spark.excel") .option("dataAddress", "teste!A1") .option("header", true) .schema(myschema) .load("/mnt/raw/externalfiles/TESTE.xlsx")

display(df)

Error getting: image

Your Environment

Databricks Runtime Version ==> 10.1 (includes Apache Spark 3.2.0, Scala 2.12) Libraries ==> com.crealytics:spark-excel_2.12:0.13.1

tquaresma avatar Dec 21 '21 12:12 tquaresma

Hi @tquaresma, could you try a newer spark-excel version and experiment with using .format("excel") instead of .format("com.crealytics.spark.excel")? It doesn't make too much sense debugging issues in old versions.

nightscape avatar Dec 21 '21 16:12 nightscape

Hi @tquaresma and @nightscape

Short: It's seem that, the issue is with older spark-excel version (I am on a fresh install machine and too lazy to test with older version, sorry). Please help try again (if it's possible for you, @tquaresma ) with newer spark-excel version?


Long and detail:

I recreate a similar data file (with some unicode character in the header) and tested with spark-excel form main branch. both .format("com.crealytics.spark.excel") and .format("excel") work as expected.

Here are all the input that I used:

  1. Excel data file: 490_tquaresma.xlsx
  2. Simple code (AppEntry.scala), you can put it anywhere inside the spark-excel:
package com.crealytics.spark

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType


object AppEntry {
  def main(args: Array[String]) = {
    println("Hello, world")
    val spark = SparkSession
      .builder()
      .master("local")
      .appName("Spark SQL basic example")
      .config("spark.ui.enabled", false)
      .getOrCreate()

      val schema = StructType(Array(
          StructField("Processo", StringType, nullable=false),
          StructField("Data", StringType, nullable=false),
          StructField("Balcao", StringType, nullable=false),
          StructField("CriadolPor", StringType, nullable=false)
      ))

      val spark_excel_implement = "com.crealytics.spark.excel" // "excel"

      val df = spark.read
      .format(spark_excel_implement)
      .option("dataAddress","Sheet1!A1")
      .option("header", true)
      .schema(schema)
      .load("/home/quanghgx/Downloads/490_tquaresma.xlsx")

      df.show()

    try {
      spark.close()
    } catch {
      case _: Exception => ()// NOP
    }

  }
}

  1. Result:
...
22/01/10 22:46:54 INFO CodeGenerator: Code generated in 8.151483 ms
+--------------+------------------+---------------+----------+
|      Processo|              Data|         Balcao|CriadolPor|
+--------------+------------------+---------------+----------+
|0000001799393A|2021-12-20 9:27:54|CICiv.-LC PORTO|    lpinho|
|0000001799393B|2021-12-21 9:27:55|CICiv.-LC PORTO| cduartelc|
|0000001799393C|2021-12-22 9:27:56|CICiv.-LC PORTO|  hbarbosa|
|0000001799393D|2021-12-23 9:27:57|CICiv.-LC PORTO|  pjsantos|
+--------------+------------------+---------------+----------+

22/01/10 22:46:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/01/10 22:46:54 INFO MemoryStore: MemoryStore cleared
...

quanghgx avatar Jan 10 '22 15:01 quanghgx