mrjob
mrjob copied to clipboard
How to launch more than one reducer to execute a job?
I wrote following code to do a words sort task
#!/usr/bin/python
# -*- coding: utf-8 -*-
from mrjob.job import MRJob
import re
class MRwordCount(MRJob):
def mapper(self, in_key, in_value):
bins = {chr(i):[] for i in range(97,123)}
for word in in_value.split(' '):
key_j = word[0]
bins[key_j].append(word)
for key_j, value_j in bins.items():
yield (key_j, sorted(value_j))
def reducer(self, key, value_list):
words = []
for value in value_list:
words += value
words = sorted(words)
yield (key, words)
if __name__ == '__main__':
MRwordCount.run()
when I run this demo in Hadoop, only 1 reducer was launched. I'd like to know how to launch more than one reducer in mrjob? I'm new in hadoop, so above question may be trivial. But I really appreciate if you can help me. Thx very much.