datafu icon indicating copy to clipboard operation
datafu copied to clipboard

Getting java.lang.NullPointerException in running PageRank

Open keeyonghan opened this issue 9 years ago • 1 comments

I have a pair of 35M of links from 117K nodes and ran pagerank job on 3 node m2.2xlarge EMR cluster. Initially I got out of memory error in the reduce phase so I increased the JVM size and then now I am getting the following error (and this happens in one reduce job and the other 3 completes without any error):

2015-01-04 03:44:40,349 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: topic_ranks: New For Each(false,false,false)[bag] - scope-42 Operator Key: scope-42): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(datafu.pig.linkanalysis.PageRank)[bag] - scope-33 Operator Key: scope-33) children: null at []]: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Any idea how to resolve this issue? I used Hadoop 2.4.0 and Pig 0.12.0.

keeyonghan avatar Jan 04 '15 17:01 keeyonghan

I am facing the same issue. Found this on Pig Jira - https://issues.apache.org/jira/browse/PIG-4169, might be related? I am trying to uprade Pig to 0.14 to see if there are any changes.

arpanbee avatar Jan 21 '15 21:01 arpanbee