FuzzManager icon indicating copy to clipboard operation
FuzzManager copied to clipboard

[FTB] UnicodeDecoderError thrown when matching non-ascii logs against unicode signatures

Open jschwartzentruber opened this issue 7 years ago • 4 comments

Following seen during fuzzing:

---CUT---
    cache_sig_file, cache_metadata = collector.search(crash_info)
  File "/usr/local/lib/python2.7/dist-packages/Reporter/Reporter.py", line 58, in decorator
    return wrapped(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Collector/Collector.py", line 183, in search
    if crashSig.matches(crashInfo):
  File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/CrashSignature.py", line 104, in matches
    if not symptom.matches(crashInfo):
  File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/Symptom.py", line 128, in matches
    if self.output.matches(line):
  File "/usr/local/lib/python2.7/dist-packages/FTB/Signatures/Matchers.py", line 67, in matches
    return self.value in val
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa6 in position 215: ordinal not in range(128)

I've got it down to a test that reproduces it:

# encoding=utf-8
class SignatureNonAsciiInStderr(unittest.TestCase):
    def runTest(self):
        config = ProgramConfiguration("test", "x86-64", "linux")
        log = [b"\xA6\n"]
        testSig = CrashSignature('''{
                "symptoms": [
                {
                    "type": "output",
                    "value": "ä"
                }
            ]
        }''')
        crashInfoPos = CrashInfo.fromRawCrashData([], [], config, auxCrashData=log)

        # Check that this doesn't match
        self.assertFalse(testSig.matches(crashInfoPos))

I think the right thing to do is force logs to be Unicode on submission. Doing .decode("cp437") on the log line makes the test pass. Logs should be textual, and if they can't be represented as Unicode, it should be up the reporter to ignore those errors. FuzzManager should also enforce the type that it expects.

jschwartzentruber avatar Mar 16 '18 18:03 jschwartzentruber

I just saw this on my Windows 10 box too:

Traceback (most recent call last):
  File "c:\mozilla-build\python\lib\multiprocessing\process.py", line 267, in _bootstrap
    self.run()
  File "c:\mozilla-build\python\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "c:\Users\fuzz1win\funfuzz\util\forkJoin.py", line 57, in redirectOutputAndCallFun
    fun(*(someArgs + (i,)))
  File "c:\Users\fuzz1win\funfuzz\bot.py", line 252, in loopFuzzingAndReduction
    loopjsfunfuzz.many_timed_runs(options.targetTime, tempDir, buildInfo.mtrArgs, collector)
  File "c:\Users\fuzz1win\funfuzz\js\loopjsfunfuzz.py", line 161, in many_timed_runs
    res = jsInteresting.ShellResult(jsInterestingOptions, jsInterestingOptions.jsengineWithArgs, logPrefix, False)
  File "c:\Users\fuzz1win\funfuzz\js\jsInteresting.py", line 134, in __init__
    match = options.collector.search(crashInfo)
  File "c:\mozilla-build\python\lib\site-packages\Reporter\Reporter.py", line 58, in decorator
    return wrapped(self, *args, **kwargs)
  File "c:\mozilla-build\python\lib\site-packages\Collector\Collector.py", line 183, in search
    if crashSig.matches(crashInfo):
  File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\CrashSignature.py", line 104, in matches
    if not symptom.matches(crashInfo):
  File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\Symptom.py", line 128, in matches
    if self.output.matches(line):
  File "c:\mozilla-build\python\lib\site-packages\FTB\Signatures\Matchers.py", line 67, in matches
    return self.value in val
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 126: ordinal not in range(128)

On MozillaBuild 3.1.1 with FuzzManager 0.1.3 installed.

nth10sd avatar Mar 16 '18 23:03 nth10sd

Logs should be textual, and if they can't be represented as Unicode, it should be up the reporter to ignore those errors. FuzzManager should also enforce the type that it expects.

How do you expect the reporter to handle this? Logs are not necessarily valid UTF-8, GDB for example regularly outputs things that contain a portion of memory, something that GDB considers a string but maybe isn't. We can't throw these reports away.

choller avatar Mar 16 '18 23:03 choller

FWIW in my case I will call .decode("utf-8", errors="ignore") on the data before passing it to FM. In the testcase (a zipfile) I will include the full stderr, stdout and debugger log dumps. This way FM is happy and if anything needs a closer inspection we have the original logs. I know this may not work for everyone but I am OK with FM forcing me to give it utf-8 compatible data.

tysmith avatar Mar 20 '18 22:03 tysmith

I think, if at all, the Collector itself should be sanitizing the data in some way (and do so consistently for all submits). Maybe we can use the errors="ignore" approach Tyson mentioned in the last comment inside the Collector and strip out illegal characters that way.

choller avatar Mar 20 '18 22:03 choller