Alex Nastetsky
Alex Nastetsky
Simple repro case: test.json: ``` {"field1":"foo","field2":123} {"field1":"bar","field2":456} ``` (spark-shell): ``` scala> val df = sqlContext.read.json("test.json") scala> df.write.format("com.databricks.spark.avro").save("test.avro") ``` java -jar avro-tools.jar tojson test.avro/part-r-00000-[uuid].avro ``` {"field1":{"string":"foo"},"field2":{"long":123}} {"field1":{"string":"bar"},"field2":{"long":456}} ``` I've tried this...
The issue is that the StructFields in the schema are being treated as nullable=true, even if you pass in false. I changed them to nullable=true in spark-avro 1.0.0 and was...
The reason it thinks it's nullable=true in 2.0.1 is because the DefaultSource class was changed from CreatableRelationProvider to HadoopFsRelationProvider, and in ResolvedDataSource#apply in the HadoopFsRelationProvider case, it uses dataSchema.asNullable to...
Looks like 1.0.0 will suffice for me, got compression working with a simple `sc.hadoopConfiguration.setBoolean("mapreduce.output.fileoutputformat.compress",true)`
workaround is to set nullable=false on the fields. for an example of how to change nullability of fields on a DF, see https://stackoverflow.com/questions/33193958/change-nullable-property-of-column-in-spark-dataframe
jaley, thanks for the tip about avro.mapred.ignore.inputs.without.extension !