mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

How to launch more than one reducer to execute a job?

Open ParadoxZW opened this issue 4 years ago • 0 comments

I wrote following code to do a words sort task

#!/usr/bin/python
# -*- coding: utf-8 -*-
from mrjob.job import MRJob
import re

class MRwordCount(MRJob):
    def mapper(self, in_key, in_value):
        bins = {chr(i):[] for i in range(97,123)}
        for word in in_value.split(' '):
            key_j = word[0]
            bins[key_j].append(word)
        for key_j, value_j in bins.items():
            yield (key_j, sorted(value_j))


    def reducer(self, key, value_list):
        words = []
        for value in value_list:
            words += value
        words = sorted(words)
        yield (key, words)

if __name__ == '__main__':
    MRwordCount.run()

when I run this demo in Hadoop, only 1 reducer was launched. I'd like to know how to launch more than one reducer in mrjob? I'm new in hadoop, so above question may be trivial. But I really appreciate if you can help me. Thx very much.

ParadoxZW avatar Jun 22 '20 11:06 ParadoxZW