spark-parquet-thrift-example
spark-parquet-thrift-example copied to clipboard
could support Thrift list?
Creating sample Thrift data.
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3])
- ExampleTable(role_id:1, role_name:test, friends_id:[1, 2, 3]) Writing sample data to Parquet.
- ParquetStore: file:///home/lintong/下载/hive_table/test
14:06:40.191 ERROR org.apache.spark.executor.Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.parquet.thrift.struct.ThriftType$StructType.
(ThriftType.java:242) at org.apache.parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:110) at org.apache.parquet.thrift.ThriftSchemaConverter.toStructType(ThriftSchemaConverter.java:97) at org.apache.parquet.hadoop.thrift.TBaseWriteSupport.getThriftStruct(TBaseWriteSupport.java:55) at org.apache.parquet.hadoop.thrift.AbstractThriftWriteSupport.init(AbstractThriftWriteSupport.java:85) at org.apache.parquet.hadoop.thrift.AbstractThriftWriteSupport.init(AbstractThriftWriteSupport.java:112) at org.apache.parquet.hadoop.thrift.ThriftWriteSupport.init(ThriftWriteSupport.java:68) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:341) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:344) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:118) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:79) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14:06:40.210 ERROR org.apache.spark.scheduler.TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job