how-query-engines-work
how-query-engines-work copied to clipboard
null while reading parquet file
I'm trying to use the engine from Scala. Pretty simple setup. Using example parquet file from testdata folder. Code looks like this:
val ctx = new ExecutionContext(Map.empty[String,String].asJava)
val pqtSource = new ParquetDataSource("data/alltypes_plain.parquet")
println(pqtSource.schema().toString)
ctx.registerDataSource("pdata",pqtSource)
val df2 = ctx.sql("select id,bool_col from pdata")
val c2 = ctx.execute(df2).iterator().asScala.toList.map(r => println(r))
First prinln statement works as expected, gives the structure:
Schema(fields=[Field(name=id, dataType=Int(32, true)), Field(name=bool_col, dataType=Bool), Field(name=tinyint_col, dataType=Int(32, true)), Field(name=smallint_col, dataType=Int(32, true)), Field(name=int_col, dataType=Int(32, true)), Field(name=bigint_col, dataType=Int(64, true)), Field(name=float_col, dataType=FloatingPoint(SINGLE)), Field(name=double_col, dataType=FloatingPoint(DOUBLE)), Field(name=date_string_col, dataType=Binary), Field(name=string_col, dataType=Binary), Field(name=timestamp_col, dataType=Binary)])
Second, only nulls:
Reading 8 rows
null,null
null,null
null,null
null,null
null,null
null,null
null,null
null,null
What am I doing wrong ? Or is might be a Scala incompatibility ?