StackOverflowError from pattern input
Thank you so much for making this tool! When testing it, I ran into this case that causes a stack overflow:
String pattern = "^[a-zA-Z\s]*$";
Generex generex = new Generex(pattern);
String firstMatch = generex.getFirstMatch();
And when that code is run I get this exception:
Exception in thread "main" java.lang.StackOverflowError
at java.base/java.util.HashMap$HashIterator.<init>(HashMap.java:1475)
at java.base/java.util.HashMap$KeyIterator.<init>(HashMap.java:1514)
at java.base/java.util.HashMap$KeySet.iterator(HashMap.java:912)
at java.base/java.util.HashSet.iterator(HashSet.java:173)
at java.base/java.util.AbstractCollection.toArray(AbstractCollection.java:184)
at dk.brics.automaton.State.getSortedTransitionArray(Unknown Source)
at dk.brics.automaton.State.getSortedTransitions(Unknown Source)
at com.mifmif.common.regex.Generex.prepareTransactionNodes(Generex.java:265)
Could this be fixed?
getFirstMatch() JavaDoc:
first string in lexicographical order that is matched by the given pattern.
Which means it tries to sort ALL of the generated matches. Your regex is infinite (it has the Kleene star), so ofc you get a SO. The solution is to use the lazy iterator:
String pattern = "^[a-zA-Z\\s]*$";
Generex generex = new Generex(pattern);
final Iterator matchesGenerator = generex.iterator();
if (matchesGenerator.hasNext()) {
String firstMatch = matchesGenerator.next();
System.out.println(firstMatch); // ^\t$
}
The output is however probably not what you wanted, because ^ and $ are not special characters in the used grammar (https://www.brics.dk/automaton/doc/index.html?dk/brics/automaton/RegExp.html). Omitting the anchors should make no difference, I think the regex matches the whole string anyway.
As you can see it is not identical to Java regexes, but close enough... Even though there are special characters: "@~&<#. They are marked optional in the used Automaton, however Generex uses all of them by default, sadly. In my fork I added the option to turn them off with the NONE flag - you could clone & install my devel branch if you want to try it.
Thank you for explaining what's happening with this input. I understand that this is caused by the infinite regex. In my opinion an infinite regex should still have a deterministic first match and should not cause a stack overflow. For my use case, using this different package better meets my needs.