codeql icon indicating copy to clipboard operation
codeql copied to clipboard

JS/RB/PY/Java: add suspicious range query

Open erik-krogh opened this issue 3 years ago • 8 comments
trafficstars

CVE-2021-42740: TP/TN

See the example JS results to see what this query flags.

The issues found are not security related in the vast majority of cases, but they are still clearly bugs.

Example results (JS is the most interesting): JavaScript, Python, Ruby, Java.

Evaluations looks fine: Ruby, Python, JavaScript, Java.
There is a slight slowdown, but I haven't been able to find a badly performing predicate in my new code.

Ruby: Some of the Ruby results are FPs due to the parser not parsing escapes as RegExpEscape.
I haven't looked into why that happens, but I'm quite sure it's somehow a bug in the parser and not the query.

Ruby/Java: The parsing of nested char classes is wrong, e.g. /[a-z&&[^a-c]]+/.
The nested [ and ] are parsed as literals instead of being parsed as a nested char class.
You can see how it should be parsed here: https://regex101.com/r/X6q22R/1

erik-krogh avatar Jun 24 '22 19:06 erik-krogh

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

github-actions[bot] avatar Jun 24 '22 19:06 github-actions[bot]

QHelp previews:

java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./java/ql/src/Security/CWE/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./javascript/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./python/ql/src/Security/CWE-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.
ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp

errors/warnings:

./ruby/ql/src/queries/security/cwe-020/SuspiciousRegexpRange.qhelp:16:89: The entity name must immediately follow the '&' in the entity reference.
A fatal error occurred: 1 qhelp files could not be processed.

github-actions[bot] avatar Jun 26 '22 20:06 github-actions[bot]

QHelp previews:

java/ql/src/Security/CWE/CWE-020/OverlyLargeRange.qhelp

Overly permissive regular expression range

It's easy to write a regular expression range that matches a wider range of characters than you intended. For example, /[a-zA-z]/ matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: [ \ ] ^ _ ` ``.

Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is clearly not intended by the writer.

Recommendation

Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.

Example

The following example code is intended to check whether a string is a valid 6 digit hex color.


import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-f]{6}", color);
    }
}

However, the A-f range is overly large and matches every uppercase character. It would parse a "color" like #XXYYZZ as valid.

The fix is to use an uppercase A-F range instead.


import java.util.regex.Pattern
public class Tester {
    public static boolean is_valid_hex_color(String color) {
        return Pattern.matches("#[0-9a-fA-F]{6}", color);
    }
}

References

javascript/ql/src/Security/CWE-020/OverlyLargeRange.qhelp

Overly permissive regular expression range

It's easy to write a regular expression range that matches a wider range of characters than you intended. For example, /[a-zA-z]/ matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: [ \ ] ^ _ ` ``.

Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is clearly not intended by the writer.

Recommendation

Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.

Example

The following example code is intended to check whether a string is a valid 6 digit hex color.


function isValidHexColor(color) {
    return /^#[0-9a-fA-f]{6}$/i.test(color);
}

However, the A-f range is overly large and matches every uppercase character. It would parse a "color" like #XXYYZZ as valid.

The fix is to use an uppercase A-F range instead.


function isValidHexColor(color) {
    return /^#[0-9A-F]{6}$/i.test(color);
}

References

python/ql/src/Security/CWE-020/OverlyLargeRange.qhelp

Overly permissive regular expression range

It's easy to write a regular expression range that matches a wider range of characters than you intended. For example, /[a-zA-z]/ matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: [ \ ] ^ _ ` ``.

Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is clearly not intended by the writer.

Recommendation

Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.

Example

The following example code is intended to check whether a string is a valid 6 digit hex color.


import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-f]{6}$', color) is not None

However, the A-f range is overly large and matches every uppercase character. It would parse a "color" like #XXYYZZ as valid.

The fix is to use an uppercase A-F range instead.


import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-F]{6}$', color) is not None

References

ruby/ql/src/queries/security/cwe-020/OverlyLargeRange.qhelp

Overly permissive regular expression range

It's easy to write a regular expression range that matches a wider range of characters than you intended. For example, /[a-zA-z]/ matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: [ \ ] ^ _ ` ``.

Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is clearly not intended by the writer.

Recommendation

Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.

Example

The following example code is intended to check whether a string is a valid 6 digit hex color.


def is_valid_hex_color(color)
    /^#[0-9a-fA-f]{6}$/.match(color)
end

However, the A-f range is overly large and matches every uppercase character. It would parse a "color" like #XXYYZZ as valid.

The fix is to use an uppercase A-F range instead.


def is_valid_hex_color(color)
    /^#[0-9a-fA-F]{6}$/.match(color)
end

References

github-actions[bot] avatar Jun 26 '22 20:06 github-actions[bot]

Taking one step back: I think it is preferable to avoid having queries that surface our misparses to end-users.

Suggestions:

  • wait with merging the Ruby/Java versions of the query, and just move ahead with JS/Python for now.
  • avoid reporting results if we are able to detect that a misparse is likely

esbena avatar Jun 28 '22 21:06 esbena

  • avoid reporting results if we are able to detect that a misparse is likely

I've filtered out those results for Java/Ruby.

erik-krogh avatar Jun 29 '22 11:06 erik-krogh

LGTM. Let's get a doc review.

aschackmull avatar Jul 13 '22 11:07 aschackmull

Docs first responder here 👋🏻 - thanks for the ping! I've put your PR on our review board ✨

mchammer01 avatar Jul 18 '22 08:07 mchammer01

Thanks @felicitymay, I've updated the query and qhelp based on your suggestions.

erik-krogh avatar Aug 09 '22 11:08 erik-krogh