dependency-cruiser icon indicating copy to clipboard operation
dependency-cruiser copied to clipboard

Issue: rule xxx has an unsafe regular expression. Bailing out. - too restrictive?

Open Lukas-Kullmann opened this issue 1 year ago • 4 comments

Expected Behavior

Validation of regular expressions fails if there are many entries in the from.path (or from.pathNot, to.path, ...) array.

Current Behavior

No matter how many entries I have in the respective array, validation should not fail. Or to be more precise: Validation should not succeed by removing any one of the entries in the array.

Possible Solution

The problem seems to be validation of the regular expressions calls safe-regex with the raw rule array. Since the rule is an array, safe-regex will stringify the rule (String(rule)) internally which results in a single, very long regex for the whole rule. If each entry contains a "repetition" (e.g. .*), the validation will always fail if there are more than 25 entries in the array. But each entry of the rule is evaluated separately later. Wouldn't it make more sense to validate each entry individually?

function safeRule(pRule, pSection, pCondition) {
  return (
    !hasPath(pRule, pSection, pCondition) ||
    Array.isArray(pRule[pSection][pCondition]) 
      ? pRule[pSection][pCondition].every(entry => safeRegex(entry))
      : safeRegex(pRule[pSection][pCondition])
  );
}

Steps to Reproduce (for bugs)

Try validating a rule that contains

{
  "from": {
    "path": [
      ".+/modules/module-1/",
      ".+/modules/module-2/",
      ".+/modules/module-3/",
      ".+/modules/module-4/",
      ".+/modules/module-5/",
      ".+/modules/module-6/",
      ".+/modules/module-7/",
      ".+/modules/module-8/",
      ".+/modules/module-9/",
      ".+/modules/module-10/",
      ".+/modules/module-11/",
      ".+/modules/module-12/",
      ".+/modules/module-13/",
      ".+/modules/module-14/",
      ".+/modules/module-15/",
      ".+/modules/module-16/",
      ".+/modules/module-17/",
      ".+/modules/module-18/",
      ".+/modules/module-19/",
      ".+/modules/module-20/",
      ".+/modules/module-21/",
      ".+/modules/module-22/",
      ".+/modules/module-23/",
      ".+/modules/module-24/",
      ".+/modules/module-25/",
      ".+/modules/module-26/",
    ]
  }
}

Note: Removing any entry in from.path will result in the validation to be successful.

The error output will be something like this:

ERROR: rule {"name":"test","severity":"error","from":{"path":[".+/modules/module-1/",".+/modules/module-2/",".+/modules/module-3/",".+/modules/module-4/",".+/modules/module-5/",".+/modules/module-6/",".+/modules/module-7/",".+/modules/module-8/",".+/modules/module-9/",".+/modules/module-10/",".+/modules/module-11/",".+/modules/module-12/",".+/modules/module-13/",".+/modules/module-14/",".+/modules/module-15/",".+/modules/module-16/",".+/modules/module-17/",".+/modules/module-18/",".+/modules/module-19/",".+/modules/module-20/",".+/modules/module-21/",".+/modules/module-22/",".+/modules/module-23/",".+/modules/module-24/",".+/modules/module-25/",".+/modules/module-26/"]},"to":{"path":"anywhere"}} has an unsafe regular expression. Bailing out.

Context

In our project, we don't use the dependency cruiser API directly, but have helper functions that build up the rules for us explicitly. We do this to be able to process the rules easier when reading the code. But it brings the downside that we cannot optimize rules as much as we could if we wrote the rules directly.

So we have some code like this

allowAccessFromTo('directory-from', ['directory-to-1', 'directory-to-2'])

Which build up several rules so that we directory-from can only access those two directories and directories that are not specified with allowAccesFromTo are not allowed to access anything.

The second rule that I describe sets up a rule like

{
  "forbidden": [
    {
      "from": {
        "path": "(.+/modules)/([^/]+)/",
        "pathNot": [
          // here go all modules that HAVE `allowAccesFromTo` defined
        ],
        "to": {
          "path": ".*/module/.*",
          "pathNot": [
            "$1/$2/.*",
            // some other directories that are always allowed
          ]
        }
    }
  ]
}

The problem is that this does not really scale to many "modules" (i.e. many calls to allowAccessFromTo). If we have more than 25 of these modules, it will always crash.

Your Environment

  • Version used: 11.15.0
  • Node version: 16.5.1
  • Operating System and version:
  • Link to your project:

Lukas-Kullmann avatar Aug 23 '22 13:08 Lukas-Kullmann

Hi @Lukas-Kullmann thank you for raising this issue - and for providing insight in your use case.

  • Passing the array to safe-regex is indeed a bug. With safe-regex’s current heuristics it doesn’t make a difference (converting it into a valid regex string with .join('|') yields the same result - currently) but it’ll be better if we passed it a real regular expression.
  • I’ve noted that safe-regex prevents a lot of bugs, but its heuristics are indeed heuristic and provide false positives. So I'd like to propose adding a “I know what I’m doing” switch to the dependency-cruiser options, which allows for overriding the check.

To manage expectations: I won’t be able to work on these until next week at the earliest.

sverweij avatar Aug 23 '22 18:08 sverweij

Hey @sverweij thanks for your answer.

I am not sure if adding the "I know what I'm doing" switch would be the best option. I agree that using safe-regex is a very helpful tool. And I don't really want to loose it. In my opinion, it would be helpful to still have the star height heuristic. Just the heuristic with the number of repetitions per Regex does not really make sense for our use case.

Also, I'm not sure if converting the array to a single regex with .join('|') is really correct. Yes, it produces the desired resulting regular expression, but that is not the one that is used in the end. As far as I can tell, the regexes in the array are evaluated individually. If that's the case, they should also be checked individually in my opinion. If I understood it correctly, safe-regex is designed to reduce the risk of individual regexes to have exponential complexity. Since we are not using one massive regex, but many small ones, they should also be vetted individually.

Lukas-Kullmann avatar Aug 25 '22 07:08 Lukas-Kullmann

Also, I'm not sure if converting the array to a single regex with .join('|') is really correct. Yes, it produces the desired resulting regular expression, but that is not the one that is used in the end.

It is the resulting regular expression - in its internal model dependency-cruiser uses just one regular expression with this utility - the array notation is just syntactical sugar. Even when we wouldn't do that, the computational complexity would remain the same - instead of letting the regular expression matching code do its work we'd have to loop through each separate regex individually.

I see safe-regex has an option that enables to influence the maximum 'repetition' limit - it'll probably not surprise you the default for that limit is 25 :-) . If I understand correctly it'll keep the star-height safety check in place even when the repetition limit is raised (some manual checks + perusing the safe-regex source code confirm this). This something we could use to enable your use case.

sverweij avatar Aug 29 '22 20:08 sverweij

@Lukas-Kullmann after studying the problem a bit more it looks like the author/ maintainer has its doubts about the repeat check as well - and I guess it's a source of false positives even for the safe-regex package itself. I think it makes sense to selective disable it.

I've done so in PR #665 - and published the result to npm as a beta ([email protected]). Feedback is welcome!

sverweij avatar Sep 16 '22 17:09 sverweij

(@Lukas-Kullmann - I've merged the PR, which closed this issue - that doesn't mean I'm not interested in your feedback - so if you have any it's still welcome!)

sverweij avatar Sep 19 '22 17:09 sverweij

... I realise I still need to fix passing an array to safe-regex. It seems it doesn't influence the outcome at the moment, but it'll be better with the proper type. Fix coming up.

sverweij avatar Sep 19 '22 18:09 sverweij