assemblyscript-regex icon indicating copy to clipboard operation
assemblyscript-regex copied to clipboard

Add NFA -> DFA transformation + optimizations in this domain

Open MaxGraey opened this issue 3 years ago • 6 comments

Great article about this: https://jasonhpriestley.com/regex-dfa

MaxGraey avatar Jan 24 '21 04:01 MaxGraey

That's a really interesting article - thanks.

ColinEberhardt avatar Jan 26 '21 09:01 ColinEberhardt

Just FYI - I've been exploring a multi-state NFA algorithm, which is conceptually similar to a DFA (it is in some senses an emulated DFA):

https://github.com/ColinEberhardt/assemblyscript-regex/tree/multi-state-NFA

However, it doesn't appear to be any faster in my benchmark tests!

ColinEberhardt avatar Jan 28 '21 11:01 ColinEberhardt

As I understand NFA -> DFA not helps in some scenarios.

MaxGraey avatar Jan 28 '21 11:01 MaxGraey

I believe the principal advantage is that DFAs do not result in back-tracking. However, many simple regex's do not result in any significant backtracking. This is why I'm keen to add some more complex expressions to the benchmark test!

ColinEberhardt avatar Jan 28 '21 11:01 ColinEberhardt

Also It seems this part could be a more effecient:

function addNextState(
  state: State,
  nextStates: State[],
  visited: State[]
): void {
  if (state.epsilonTransitions.length > 0) {
    for (let i = 0; i < state.epsilonTransitions.length; i++) {
      const st = state.epsilonTransitions[i];
      if (!visited.includes(st)) {
        visited.push(st);
        addNextState(st, nextStates, visited);
      }
    }
  } else {
    nextStates.push(state);
  }
}

with avoiding recursion and use stack / queue instead. Also visited better replace to Set instead array

MaxGraey avatar Jan 28 '21 16:01 MaxGraey

https://v8.dev/blog/non-backtracking-regexp

MaxGraey avatar Jan 29 '21 11:01 MaxGraey