spark-excel
spark-excel copied to clipboard
java.lang.RuntimeException: scala.Some is not a valid external type for schema of string
Steps to Reproduce (for bugs)
Excel File to load:
Code to load file: %scala import org.apache.spark.sql._ import org.apache.spark.sql.types._
val myschema = StructType(Array( StructField("Processo", StringType, nullable = false), StructField("Data", StringType, nullable = false), StructField("Balcao", StringType, nullable = false), StructField("CriadoPor", StringType, nullable = false) ))
val df = spark.read .format("com.crealytics.spark.excel") .option("dataAddress", "teste!A1") .option("header", true) .schema(myschema) .load("/mnt/raw/externalfiles/TESTE.xlsx")
display(df)
Error getting:
Your Environment
Databricks Runtime Version ==> 10.1 (includes Apache Spark 3.2.0, Scala 2.12) Libraries ==> com.crealytics:spark-excel_2.12:0.13.1
Hi @tquaresma, could you try a newer spark-excel version and experiment with using .format("excel")
instead of .format("com.crealytics.spark.excel")
?
It doesn't make too much sense debugging issues in old versions.
Hi @tquaresma and @nightscape
Short: It's seem that, the issue is with older spark-excel version (I am on a fresh install machine and too lazy to test with older version, sorry). Please help try again (if it's possible for you, @tquaresma ) with newer spark-excel version?
Long and detail:
I recreate a similar data file (with some unicode character in the header) and tested with spark-excel form main branch. both .format("com.crealytics.spark.excel") and .format("excel") work as expected.
Here are all the input that I used:
- Excel data file: 490_tquaresma.xlsx
- Simple code (AppEntry.scala), you can put it anywhere inside the spark-excel:
package com.crealytics.spark
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType
object AppEntry {
def main(args: Array[String]) = {
println("Hello, world")
val spark = SparkSession
.builder()
.master("local")
.appName("Spark SQL basic example")
.config("spark.ui.enabled", false)
.getOrCreate()
val schema = StructType(Array(
StructField("Processo", StringType, nullable=false),
StructField("Data", StringType, nullable=false),
StructField("Balcao", StringType, nullable=false),
StructField("CriadolPor", StringType, nullable=false)
))
val spark_excel_implement = "com.crealytics.spark.excel" // "excel"
val df = spark.read
.format(spark_excel_implement)
.option("dataAddress","Sheet1!A1")
.option("header", true)
.schema(schema)
.load("/home/quanghgx/Downloads/490_tquaresma.xlsx")
df.show()
try {
spark.close()
} catch {
case _: Exception => ()// NOP
}
}
}
- Result:
...
22/01/10 22:46:54 INFO CodeGenerator: Code generated in 8.151483 ms
+--------------+------------------+---------------+----------+
| Processo| Data| Balcao|CriadolPor|
+--------------+------------------+---------------+----------+
|0000001799393A|2021-12-20 9:27:54|CICiv.-LC PORTO| lpinho|
|0000001799393B|2021-12-21 9:27:55|CICiv.-LC PORTO| cduartelc|
|0000001799393C|2021-12-22 9:27:56|CICiv.-LC PORTO| hbarbosa|
|0000001799393D|2021-12-23 9:27:57|CICiv.-LC PORTO| pjsantos|
+--------------+------------------+---------------+----------+
22/01/10 22:46:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/01/10 22:46:54 INFO MemoryStore: MemoryStore cleared
...