noam icon indicating copy to clipboard operation
noam copied to clipboard

Regex gym and fsm simulator gets stuck

Open Mahesha999 opened this issue 6 years ago • 1 comments

I was trying to find the dfa accepting strings containing at least three occurrences of three consecutive 1's on alphabet Σ={0,1} with overlapping permitted.

I come up with following long regex:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
+(0+1)*11111(0+1)*
)

First line for strings containing non overlapping three occurrences of three consecutive 1's Second line for strings containing first two occurrences overlapping (that is four consecutive 1's) Third line for strings containing last two occurrences overlapping (that is four consecutive 1's) Last line for all occurrences overlapping (that is five consecutive 1's)

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers. So I tried to feed the regex to regex gym. It also showed same behaviour.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
)

So whats going on here? Is it that the original regex is excessively complex?

Mahesha999 avatar Oct 02 '17 08:10 Mahesha999

Hi @Mahesha999 👋

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers.

Just to make sure I understand -- did you try to generate a DFA or an eNFA? I see the "blocking" processing when I try to generate a DFA, but an eNFA is generated just fine.

So I tried to feed the regex to regex gym. It also showed same behaviour.

Strange -- the regex gym starts reducing the regex just fine for me.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

Yeah, strange again -- I see no difference in the behavior I described above when I remove the last part.

So whats going on here? Is it that the original regex is excessively complex?

Yeah, it's likely that converting the eNFA into a DFA and then minimizing the DFA is expensive. It's possible the might be some performance optimizations that can be made to the library, but I don't have time to look into that right now.

izuzak avatar Oct 09 '17 12:10 izuzak