Vyacheslav Murashkin
Vyacheslav Murashkin
Now to get source path from the mapper routine just add **kwargs to the arguments list. Here are some examples. ``` @dumbo.decor.primary def map_primary(key, value, **kwargs): key, value = value.strip().split('\t')...
Some jobs do not produce any output for ex. uploading input data to external storage or something like this. Dumbo expected that each reducer or mapper yields some data otherwise...
Streaming backend assumes that input format is typedbytes even if -inputformat argument is 'text': https://github.com/klbostee/dumbo/blob/release-0.21.36/dumbo/backends/streaming.py#L81 This leads to apply typedbytes.PairedInput to all input lines: https://github.com/klbostee/dumbo/blob/release-0.21.36/dumbo/core.py#L380 Appling util.loadtext instead of typedbytes.PairedInput...
Custom mapper `cleanup` function would never be called in case of MultiMapper usage.
In some cases it could be useful to store all commandline -param args in global variable like dumbo.params. Yes, I know that self.params provides such functionality. But if I want...