whatthepatch icon indicating copy to clipboard operation
whatthepatch copied to clipboard

Review potential backtracking issues

Open cscorley opened this issue 6 years ago • 1 comments
trafficstars

From my email

Hi Christopher,

My name is James Davis. I'm a security researcher at Virginia Tech.

The pypi module whatthepatch has 2 regular expressions vulnerable to catastrophic backtracking. The vulnerable expressions are listed below. If you don't know, catastrophic backtracking is when the regex engine takes more than linear time to scan a string. There are lots of resources about it on the web. I have included some starting points below.

Catastrophic backtracking is particularly problematic if two conditions are met:

  1. The module is used by server processes, and
  2. The regex can be reached by user input.

Please evaluate whether these conditions apply to your users and your regexes. However, copy/pasting regexes is quite common. Even if one or both conditions fail now, these regexes may be a time bomb in your code for later -- you might still want to fix it. For example, if your module is used on the client side, consider whether a server-side counterpart might re-use this pattern later.

Fix advice:

If you want to make fixes, here are the most common approaches with examples:

  1. Refactor the regex pattern
    • https://github.com/killmenot/valid-data-url/commit/64bad3cf1eff246103d71b51f945d7ea73bf7adf
    • https://github.com/htmllint/htmllint/commit/eec921cee4327d7fe02ba6812bc39dc29f46b12a
  2. Replace the regex with custom parsing
    • https://github.com/jshttp/forwarded/commit/d469116eda4931fbe1c0ccb29497b35930bfa328
    • https://github.com/kriskowal/q-io/commit/b9c54c809c48306835506a3baad5dab67bec9dab
  3. Apply input sanitization, suitable if legitimate input strings are (much) shorter than the attack string
    • https://github.com/hgoebl/mobile-detect.js/commit/7222f6e75cf8cd90e1dc53e445716203eaf79d8a
    • https://github.com/node-modules/charset/commit/effda0c48c51b47a47f4cad7db0c51ee7407cc1b

You can study other examples using the links from Snyk.io's vulnerability database: https://snyk.io/vuln/?packageManager=all

Validating your fix:

  1. I am happy to test a refactored regex with my tools -- send me an email with the commit ID.
  2. Existing GitHub projects for regex testing are:
    • substack's safe-regex project (https://github.com/substack/safe-regex) -- easy to install, has false positives and some bugs
    • Weideman's RegexStaticAnalysis project (https://github.com/NicolaasWeideman/RegexStaticAnalysis) -- a bit more work to install, may have false positives

If this is a new topic for you, here are some good starting points for information:

  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/backtracking-in-regular-expressions
  • https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices#Backtracking
  • https://snyk.io/blog/redos-and-catastrophic-backtracking

======================================

Now, here are the vulnerable patterns with attack details.

Each vulnerability has the following keys, explained in more detail below:

  • pattern
  • filesIn (as of December/January; I excluded any appearances in irrelevant-looking dirs, and in '.min' files)
  • stringLenFor10Sec
  • nPumpsFor10Sec
  • attackFormat
  • blowupCurve

The attack format describes how to generate an attack string. On my machine, an attack string generated using nPumpsFor10Sec repetitions ("pumps") of the pump string(s) blocks the python regex engine for 10 seconds, though this will vary based on your hardware.

Compose an attack string like this: 'prefix 1' + 'pump 1' X times + 'prefix 2' + 'pump 2' X times + ... + suffix Example: With pumpPairs: [{'prefix': 'a', 'pump': 'b'}], suffix: 'c', an attack string with three pumps would be: abbbc

Catastrophic backtracking blows up at either an exponential rate or a super-linear (power law) rate. The blowupCurve indicates how severe the blow-up is. The 'type' is either EXP(onential) or POW(er law) in the number of pumps (x). The 'parms' are the parameters for the two curve types. The second parameter is more important, because: EXP: f(x) = parms[0] * parms[1]^x POW: f(x) = parms[0] * x^parms[1]

I strongly recommend you address any EXP(onential) blow-ups. POW(er law) blow-ups vary in their severity depending on the value of the second parameter, and in the length of the attack string ('stringLenFor10Sec'). But they are still a legitimate attack vector.

JSON formatted:

Vuln 1:

{
   "nPumpsFor10Sec" : "70755",
   "filesIn" : [
      [
         "whatthepatch/patch.py"
      ]
   ],
   "pattern" : ".*(\\(.*\\))$",
   "stringLenFor10Sec" : 70757,
   "attackFormat" : {
      "pumpPairs" : [
         {
            "pump" : "(",
            "prefix" : "a"
         }
      ],
      "suffix" : "("
   },
   "blowupCurve" : {
      "type" : "POWER",
      "r2" : 0.999691087941083,
      "parms" : [
         3.56638848402774e-09,
         1.94916438553238
      ]
   }
}




Vuln 2:

{
   "attackFormat" : {
      "suffix" : "*0",
      "pumpPairs" : [
         {
            "prefix" : "*** 0",
            "pump" : "0"
         }
      ]
   },
   "blowupCurve" : {
      "parms" : [
         9.13185781990315e-08,
         1.82012197261697
      ],
      "r2" : 0.994885929334018,
      "type" : "POWER"
   },
   "nPumpsFor10Sec" : "27276",
   "filesIn" : [
      [
         "whatthepatch/patch.py"
      ]
   ],
   "stringLenFor10Sec" : 27283,
   "pattern" : "^\\*\\*\\* (\\d+),?(\\d*) \\*\\*\\*\\*$"
}

Best,

James

cscorley avatar Mar 26 '19 03:03 cscorley

Identified expression 1: https://github.com/cscorley/whatthepatch/blob/725a9831c0b5086bb8081bd538516d26206527d7/whatthepatch/patch.py#L62

Identified expression 2: https://github.com/cscorley/whatthepatch/blob/725a9831c0b5086bb8081bd538516d26206527d7/whatthepatch/patch.py#L31

cscorley avatar Mar 26 '19 03:03 cscorley