spark-xml
spark-xml copied to clipboard
XML data source for Spark SQL and DataFrames
https://github.com/databricks/spark-xml/blob/c42d6bcdf98719b41248954feecf21bb2596feaa/src/main/scala/com/databricks/spark/xml/XmlOptions.scala#L24 Here is the code snippet of a process in Production in Azure Databricks. ```py df \ .where("COL1 = 'VAL'") \ .selectExpr(COL1, COL2) \ .coalesce(1) \ .write \ .format('com.databricks.spark.xml') \...
Example input: ``` ``` Expected output: `fruits: struct` Actual output: `fruits: struct` Proposed fix: https://github.com/databricks/spark-xml/blob/ddd1ef573a5318748763fafc974e4f7d8876fd6f/src/main/scala/com/databricks/spark/xml/util/XSDToSchema.scala#L227 ```diff - if (element.getMaxOccurs == 1) { + if (element.getMaxOccurs == 1 && choice.getMaxOccurs ==...
Hello! If I load files with identical names, but different letter case - I'm getting an error. But I wish get NULL string or two columns with different letter case...
I try to create a spark datasource table, specify options (rowTag "xxx" , path "hdfs://xxx.xml") but it told me the table is an external table should not create in /user/hive/warehouse/...
Closes #688 Fix problem with cast scala.collection.mutable.ArrayBuffer to org.apache.spark.sql.catalyst.util.ArrayData
Hello! I try use Scala functions in Pythob from documentations on big xml data. All work correct if I use Spark with Scala 2.12. But on production cluster installed Spark...