pignlproc icon indicating copy to clipboard operation
pignlproc copied to clipboard

Running nerd-stats on part of the Wikipedia dump

Open Nasreddine opened this issue 10 years ago • 4 comments

I've tried locally to run nerd-stats.pig script on part of Wikipedia dump http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2, I got intended statistics. But when I tried to run the same script on part of the above wiki dump, no results were available. Does the script require minimum amount of data ?

Nasreddine avatar Dec 06 '14 07:12 Nasreddine

I'm stuck with the same problem on named entity extraction script. Everything works fine with hadoop. But the output folder is not created...

tilneyyang avatar Dec 11 '14 06:12 tilneyyang

it might be the problem of the OUTPUT path you set. I tried with local dir, no luck. Then with a hdfs path and got the final results. I'm not familiar with hadoop or pig, hope someone can figure it out...

tilneyyang avatar Dec 11 '14 08:12 tilneyyang

In my case the output folder is created, but no results were created. How did you set the hdfs path ?

Nasreddine avatar Dec 11 '14 12:12 Nasreddine

The full hdfs path /user/username/outputdir. Also please do try hadoop0.20.0 with the script, otherwise there might also be unexpected problems cause by hadoop version.

tilneyyang avatar Dec 15 '14 02:12 tilneyyang