pignlproc
pignlproc copied to clipboard
Running nerd-stats on part of the Wikipedia dump
I've tried locally to run nerd-stats.pig script on part of Wikipedia dump http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2, I got intended statistics. But when I tried to run the same script on part of the above wiki dump, no results were available. Does the script require minimum amount of data ?
I'm stuck with the same problem on named entity extraction script. Everything works fine with hadoop. But the output folder is not created...
it might be the problem of the OUTPUT path you set. I tried with local dir, no luck. Then with a hdfs path and got the final results. I'm not familiar with hadoop or pig, hope someone can figure it out...
In my case the output folder is created, but no results were created. How did you set the hdfs path ?
The full hdfs path /user/username/outputdir. Also please do try hadoop0.20.0 with the script, otherwise there might also be unexpected problems cause by hadoop version.