mrjob
mrjob copied to clipboard
TypeError when writing to stderr within a job on Python 3
We're running into an issue with str
vs bytes
on Python 3 related to commit https://github.com/Yelp/mrjob/commit/0f0297b372fe9d5875915f7c3782b168543dd390 which changes sys.stderr
from a TextIOWrapper
in 'w'
mode to a BufferedWriter
in 'wb'
mode.
Example
The error occurs with any attempt to write a str
to sys.stdout
or sys.stderr
, e.g. via print
. In our particular case, some of the libraries we depend on are printing warnings using the builtin warnings
module, which internally writes to sys.stderr
by default. See example below, tested inline on Python 3.7.4 and 3.8.0 with mrjob 0.7.1.
import warnings
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
warnings.warn('Here is a warning')
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
Traceback
No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory /tmp/mr_word_count.ec2-user.20200327.160028.510282
Running step 1 of 1...
Error while reading from /tmp/mr_word_count.ec2-user.20200327.160028.510282/step/000/mapper/00000/input:
Traceback (most recent call last):
File "mr_word_count.py", line 18, in <module>
MRWordFrequencyCount.run()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 616, in run
cls().execute()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 687, in execute
self.run_job()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 636, in run_job
runner.run()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/runner.py", line 497, in run
self._run()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 160, in _run
self._run_step(step, step_num)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 169, in _run_step
self._run_streaming_step(step, step_num)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 180, in _run_streaming_step
self._run_mappers_and_combiners(step_num, map_splits)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 221, in _run_mappers_and_combiners
for task_num, map_split in enumerate(map_splits)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 129, in _run_multiple
func()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 723, in _run_mapper_and_combiner
run_mapper()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/sim.py", line 746, in _run_task
stdin, stdout, stderr, wd, env)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/inline.py", line 132, in invoke_task
task.execute()
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 675, in execute
self.run_mapper(self.options.step_num)
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 760, in run_mapper
for k, v in self.map_pairs(read_lines(), step_num=step_num):
File "/home/ec2-user/.local/lib/python3.7/site-packages/mrjob/job.py", line 830, in map_pairs
for k, v in mapper(key, value) or ():
File "mr_word_count.py", line 8, in mapper
warnings.warn('Here is a warning')
File "/usr/lib64/python3.7/warnings.py", line 112, in _showwarnmsg
_showwarnmsg_impl(msg)
File "/usr/lib64/python3.7/warnings.py", line 30, in _showwarnmsg_impl
file.write(text)
TypeError: a bytes-like object is required, not 'str'
Appreciate any guidance you can give us on options to work around this. Thanks!
Believe we've found a workaround for the warnings
case in particular. This will redirect output from the warnings
module to the logging system:
logging.captureWarnings(True)
https://docs.python.org/3/library/logging.html#logging.captureWarnings