dfuzzer icon indicating copy to clipboard operation
dfuzzer copied to clipboard

Machine-readable logs

Open evverx opened this issue 2 years ago • 7 comments

In its current form logs are supposed to look like https://github.com/matusmarhefka/dfuzzer/pull/4 to make reprogen.py work as far as I understand but it would probably make sense to revisit the format to make it easier to parse logs in general. Those logs could help to look for example for timeouts that are ignored by dfuzzer by default.

evverx avatar May 07 '22 17:05 evverx

One of the major flaws of the current (CSV) format is that the separator (;) can appear in the randomly generated strings, making machine-parsing of the log file harder or sometimes almost impossible.

mrc0mmand avatar May 07 '22 17:05 mrc0mmand

Those logs could help to look for example for timeouts

Looks like timeouts have never been logged by dfuzzer :-(

evverx avatar May 07 '22 18:05 evverx

As for the random strings, I guess one possible fix would be to process the strings via https://docs.gtk.org/glib/func.strescape.html before printing them out. This might also help with #80, since strings could be wrapped in " and identified by that. As the documentation suggests, this operation could be easily reversed by https://docs.gtk.org/glib/func.strcompress.html, and the escape sequences should be compatible with bash as well:

Escapes the special characters '\b', '\f', '\n', '\r', '\t', '\v', '' and '"' in the string source by inserting a '\' before them. Additionally all characters in the range 0x01-0x1F (everything below SPACE) and in the range 0x7F-0xFF (all non-ASCII chars) are replaced with a '\' followed by their octal representation. Characters supplied in exceptions are not escaped.

mrc0mmand avatar May 08 '22 19:05 mrc0mmand

I'd pick json (or any other format where escaping is no longer an issue) because for example busctl dumps stuff like

{
        "type" : "method_call",
        "endian" : "l",
        "flags" : 0,
        "version" : 1,
        "cookie" : 2,
        "timestamp-realtime" : 1652039190518701,
        "sender" : ":1.147",
        "destination" : "org.freedesktop.resolve1",
        "path" : "/org/freedesktop/resolve1",
        "interface" : "org.freedesktop.resolve1.Manager",
        "member" : "ResolveHostname",
        "payload" : {
                "type" : "isit",
                "data" : [
                        0,
                        "google.com",
                        0,
                        0
                ]
        }
}

and it can be put into "advanced" dictionaries: https://github.com/matusmarhefka/dfuzzer/issues/81. The idea is to monitor the system bus, pick "valid" messages and stuff them into those dictionaries (semi-automatically hopefully)

evverx avatar May 08 '22 19:05 evverx

That sounds definitely better, and should be relatively easily doable via https://gnome.pages.gitlab.gnome.org/json-glib/ and maybe even with https://gnome.pages.gitlab.gnome.org/json-glib/json-gvariant.html.

mrc0mmand avatar May 08 '22 20:05 mrc0mmand

Giving json_gvariant_serialize_data() a quick spin, it seems to work like a charm:

   -- Signature: (isaaai(y(b(n(q(iua{ov})v)o))x(dh))a{t(bov)})
   -- Value: (-2147483648, 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', [[@ai []]], (byte 0x00, (false, (int16 -32768, (uint16 0, (-2147483648, uint32 0, {objectpath '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>, '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>, '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>}), <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), objectpath '/')), int64 -9223372036854775808, (1.7976931348623157e+308, handle 0)), {uint64 0: (false, objectpath '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>)})
Serialized GVariant: [-2147483648,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",[[[]]],[0,[false,[-32768,[0,[-2147483648,0,{"/":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}],"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],"/"]],-9223372036854775808,[1.7976931348623157e+308,0]],{"0":[false,"/","AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"]}]

$ echo '[-2147483648,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",[[[]]],[0,[false,[-32768,[0,[-2147483648,0,{"/":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}],"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],"/"]],-9223372036854775808,[1.7976931348623157e+308,0]],{"0":[false,"/","AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"]}]' | jq .
[
  -2147483648,
  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
  [
    [
      []
    ]
  ],
  [
    0,
    [
      false,
      [
        -32768,
        [
          0,
          [
            -2147483648,
            0,
            {
              "/": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
            }
          ],
          "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        ],
        "/"
      ]
    ],
    -9223372036854775808,
    [
      1.7976931348623157E+308,
      0
    ]
  ],
  {
    "0": [
      false,
      "/",
      "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
    ]
  }
]

That should, hopefully, be compatible with the format produced by busctl as well.

mrc0mmand avatar May 09 '22 10:05 mrc0mmand

Also, would it make sense to log only unsuccessful cases? Something like libfuzzer/AFL does - i.e. log only crashes/timeouts, once such case per file, so they can be then used as 'reproducers' later. Or do we want to log everything into one file, marked by a type of fail (timeout, crash, ...)?

mrc0mmand avatar May 09 '22 12:05 mrc0mmand