CopybookInputFormat
CopybookInputFormat copied to clipboard
NPE on spark execution, which is a warning
Hi, I intermittently get an null pointer exception while running a spark job. The stack trace is:
18/03/08 11:16:40 WARN scheduler.TaskSetManager: Lost task 446.0 in stage 0.0 (TID 513, dwbdtest1r1w4.wellpoint.com, executor 15): java.lang.RuntimeException: java.lang.NullPointerException
at com.cloudera.sa.copybook.mapreduce.CopybookRecordReader.initialize(CopybookRecordReader.java:88)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:182)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:179)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at net.sf.JRecord.External.CobolCopybookLoader.loadCopyBook(CobolCopybookLoader.java:142)
at com.cloudera.sa.copybook.mapreduce.CopybookRecordReader.initialize(CopybookRecordReader.java:56)
... 18 more
Strange thing is, the job completes fine. Also the line numbers do not seem to match.
Update
another error I see on the executors is:
java.lang.RuntimeException: The file "lexer.dat" is either missing or corrupted.
at net.sf.cb2xml.sablecc.lexer.Lexer.<init>(Unknown Source)
at net.sf.cb2xml.Cb2Xml.convert(Unknown Source)
at net.sf.cb2xml.Cb2Xml.convertToXMLDOM(Unknown Source)
at net.sf.JRecord.External.CobolCopybookLoader.loadCopyBook(CobolCopybookLoader.java:132)
at com.cloudera.sa.copybook.mapreduce.CopybookRecordReader.initialize(CopybookRecordReader.java:56)
any thoughts?
thanks
ameet
cb2xml is not totally thread safe. If running multiple threads you could get this error