dumbo issues

Custom Input File Formats

Does Dumbo support custom input file formats e.g. WholeFileInputFormat.class which treats the entire file contents as a single record? I compiled WholeFileInputFormat.java (from Hadoop: The Definitive Guide) and created a...

sv2000

memlimit enabled by default

By default memlimit should be infinity.

ediskandarov

Crash if mapper or reducer does not yield anything

1

Some jobs do not produce any output for ex. uploading input data to external storage or something like this. Dumbo expected that each reducer or mapper yields some data otherwise...

a4tunado

Reading text as typedbytes affects lines with encoding other than utf8

Streaming backend assumes that input format is typedbytes even if -inputformat argument is 'text': https://github.com/klbostee/dumbo/blob/release-0.21.36/dumbo/backends/streaming.py#L81 This leads to apply typedbytes.PairedInput to all input lines: https://github.com/klbostee/dumbo/blob/release-0.21.36/dumbo/core.py#L380 Appling util.loadtext instead of typedbytes.PairedInput...

a4tunado

Support for SequenceFiles in local runs

1

How about adding support for SequenceFiles for local runs? It seems it would be a matter of adding SequenceFIle decoder/encoder, just like 'code' format works today.

igorgatis

Integration Amazon EMR

Sounds like all that's needed is a new backend to talks to s3 file system and EMR jobflow control (via boto API). Essential features: - Read input from and write...

igorgatis

" -file option is deprecated, please use generic option -files instead."

1

Hello! I am trying to run a job for our data team and we are getting errors using dumbo. We are using the latest version of Dumbo and Cloudera. Command...

UMDTERPS

cdh4, centos 6.3, cannot get simple dumbo job to run.

1

// my python job def mapper(key, value): yield value.split(" ")[0], 1 def reducer(key, values): yield key, sum(values) if **name** == "**main**": import dumbo dumbo.run(mapper, reducer, combiner=reducer) // my command (version...

bgeary

MultiMapper does not support cleanup functionality

Custom mapper `cleanup` function would never be called in case of MultiMapper usage.

a4tunado

MultiMapper fails with single-parameter mappers

1

If I write a map function with the alternative low-level single-parameter interface, then give it to `MultiMapper`: ``` import dumbo from dumbo.lib import MultiMapper from dumbo.decor import primary @primary def...

jkleint

dumbo
dumbo copied to clipboard

Metadata

Custom Input File Formats

memlimit enabled by default

Crash if mapper or reducer does not yield anything

Reading text as typedbytes affects lines with encoding other than utf8

Support for SequenceFiles in local runs

Integration Amazon EMR

" -file option is deprecated, please use generic option -files instead."

cdh4, centos 6.3, cannot get simple dumbo job to run.

MultiMapper does not support cleanup functionality

MultiMapper fails with single-parameter mappers

← Metadata

Owner

Metadata

dumbo dumbo copied to clipboard

Metadata

← Metadata

Owner

Metadata

dumbo
dumbo copied to clipboard