FuzzManager
FuzzManager copied to clipboard
[FTB] UnicodeDecoderError thrown when matching non-ascii logs against unicode signatures
Following seen during fuzzing:
---CUT---
cache_sig_file, cache_metadata = collector.search(crash_info)
File "/usr/local/lib/python2.7/dist-packages/Reporter/Reporter.py", line 58, in decorator
return wrapped(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/Collector/Collector.py", line 183, in search
if crashSig.matches(crashInfo):
File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/CrashSignature.py", line 104, in matches
if not symptom.matches(crashInfo):
File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/Symptom.py", line 128, in matches
if self.output.matches(line):
File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/Matchers.py", line 67, in matches
return self.value in val
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa6 in position 215: ordinal not in range(128)
I've got it down to a test that reproduces it:
# encoding=utf-8
class SignatureNonAsciiInStderr(unittest.TestCase):
def runTest(self):
config = ProgramConfiguration("test", "x86-64", "linux")
log = [b"\xA6\n"]
testSig = CrashSignature('''{
"symptoms": [
{
"type": "output",
"value": "ä"
}
]
}''')
crashInfoPos = CrashInfo.fromRawCrashData([], [], config, auxCrashData=log)
# Check that this doesn't match
self.assertFalse(testSig.matches(crashInfoPos))
I think the right thing to do is force logs to be Unicode on submission. Doing .decode("cp437") on the log line makes the test pass. Logs should be textual, and if they can't be represented as Unicode, it should be up the reporter to ignore those errors. FuzzManager should also enforce the type that it expects.
I just saw this on my Windows 10 box too:
Traceback (most recent call last):
File "c:\mozilla-build\python\lib\multiprocessing\process.py", line 267, in _bootstrap
self.run()
File "c:\mozilla-build\python\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "c:\Users\fuzz1win\funfuzz\util\forkJoin.py", line 57, in redirectOutputAndCallFun
fun(*(someArgs + (i,)))
File "c:\Users\fuzz1win\funfuzz\bot.py", line 252, in loopFuzzingAndReduction
loopjsfunfuzz.many_timed_runs(options.targetTime, tempDir, buildInfo.mtrArgs, collector)
File "c:\Users\fuzz1win\funfuzz\js\loopjsfunfuzz.py", line 161, in many_timed_runs
res = jsInteresting.ShellResult(jsInterestingOptions, jsInterestingOptions.jsengineWithArgs, logPrefix, False)
File "c:\Users\fuzz1win\funfuzz\js\jsInteresting.py", line 134, in __init__
match = options.collector.search(crashInfo)
File "c:\mozilla-build\python\lib\site-packages\Reporter\Reporter.py", line 58, in decorator
return wrapped(self, *args, **kwargs)
File "c:\mozilla-build\python\lib\site-packages\Collector\Collector.py", line 183, in search
if crashSig.matches(crashInfo):
File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\CrashSignature.py", line 104, in matches
if not symptom.matches(crashInfo):
File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\Symptom.py", line 128, in matches
if self.output.matches(line):
File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\Matchers.py", line 67, in matches
return self.value in val
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 126: ordinal not in range(128)
On MozillaBuild 3.1.1 with FuzzManager 0.1.3 installed.
Logs should be textual, and if they can't be represented as Unicode, it should be up the reporter to ignore those errors. FuzzManager should also enforce the type that it expects.
How do you expect the reporter to handle this? Logs are not necessarily valid UTF-8, GDB for example regularly outputs things that contain a portion of memory, something that GDB considers a string but maybe isn't. We can't throw these reports away.
FWIW in my case I will call .decode("utf-8", errors="ignore") on the data before passing it to FM. In the testcase (a zipfile) I will include the full stderr, stdout and debugger log dumps. This way FM is happy and if anything needs a closer inspection we have the original logs. I know this may not work for everyone but I am OK with FM forcing me to give it utf-8 compatible data.
I think, if at all, the Collector itself should be sanitizing the data in some way (and do so consistently for all submits). Maybe we can use the errors="ignore" approach Tyson mentioned in the last comment inside the Collector and strip out illegal characters that way.