Alex Nastetsky comments

Results 6 comments of


                                            Alex Nastetsky

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

Simple repro case: test.json: ``` {"field1":"foo","field2":123} {"field1":"bar","field2":456} ``` (spark-shell): ``` scala> val df = sqlContext.read.json("test.json") scala> df.write.format("com.databricks.spark.avro").save("test.avro") ``` java -jar avro-tools.jar tojson test.avro/part-r-00000-[uuid].avro ``` {"field1":{"string":"foo"},"field2":{"long":123}} {"field1":{"string":"bar"},"field2":{"long":456}} ``` I've tried this...

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

The issue is that the StructFields in the schema are being treated as nullable=true, even if you pass in false. I changed them to nullable=true in spark-avro 1.0.0 and was...

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

The reason it thinks it's nullable=true in 2.0.1 is because the DefaultSource class was changed from CreatableRelationProvider to HadoopFsRelationProvider, and in ResolvedDataSource#apply in the HadoopFsRelationProvider case, it uses dataSchema.asNullable to...

Alex Nastetsky

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

spark-avro 2.0.1 generates strange schema (spark-avro 1.0.0 is fine)

Relax filesystem layout constraints