stream-reactor
stream-reactor copied to clipboard
Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\)
My Hadoop version is apache-hadoop-2.6.3, hive version is apache-hive-1.2.1,kafka-connect-hive version is kafka-connect-hive-1.2.1-2.1.0-all.jar.
I according to the Hive Source (https://docs.lenses.io/connectors/source/hive.html) examples created hive_connect database and started hive-source-connetor, but I met this exception:
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:129)
at com.landoop.streamreactor.connect.hive.formats.ParquetHiveFormat$$anon$2$$anonfun$iterator$1.apply(ParquetHiveFormat.scala:45)
at com.landoop.streamreactor.connect.hive.formats.ParquetHiveFormat$$anon$2$$anonfun$iterator$1.apply(ParquetHiveFormat.scala:45)
at scala.collection.Iterator$$anon$9.next(Iterator.scala:162)
at scala.collection.Iterator$$anon$16.hasNext(Iterator.scala:599)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$ConcatIterator.advance(Iterator.scala:183)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:195)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:390)
at com.landoop.streamreactor.connect.hive.source.HiveSource.hasNext(HiveSource.scala:62)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:390)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at scala.collection.AbstractIterator.toList(Iterator.scala:1334)
at com.landoop.streamreactor.connect.hive.source.HiveSourceTask.poll(HiveSourceTask.scala:68)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:244)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:220)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The stack has moved on since then, I would be eager to know if this is still a problem with the latest master version? If you want to reach out to me on community Slack I can help.
Closing due to deprecation of the hive connector.