pregexp icon indicating copy to clipboard operation
pregexp copied to clipboard

Matching with optional subexpression, time increases exponentially with length of input string when match fails

Open robert-dodier opened this issue 2 years ago • 0 comments

I'm working with the current (2020-01-29) version of pregexp. It looks like the time required to match an input which doesn't match the regex increases exponentially with the length of the input string, when there is an optional subexpression in the pattern.

I am working with SBCL 2.1.6. I have edited the console log for clarity.

First example. PREGEXP-MATCH returns quickly when the input matches.

* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaa | bbbbb | ccccc | foobar" 0 NIL))
Evaluation took:
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
("aaaaa | bbbbb | ccccc | foobar" "foobar")

Additional examples. PREGEXP-MATCH takes a long time when the match fails, and time approximately doubles with each additional input character.

* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaa | bbbbb | ccccc | ddddd" 0 NIL))
Evaluation took:
  0.416000 seconds of total run time (0.404000 user, 0.012000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaa | bbbbb | ccccc | dddddd" 0 NIL))
Evaluation took:
  0.784000 seconds of total run time (0.780000 user, 0.004000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaa | bbbbb | cccccc | dddddd" 0 NIL))
Evaluation took:
  1.556000 seconds of total run time (1.540000 user, 0.016000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaa | bbbbbb | cccccc | dddddd" 0 NIL))
Evaluation took:
  3.112000 seconds of total run time (3.044000 user, 0.068000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaaa | bbbbbb | cccccc | dddddd" 0 NIL))
Evaluation took:
  6.108000 seconds of total run time (6.068000 user, 0.040000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaaa | bbbbbb | cccccc | ddddddd" 0 NIL))
Evaluation took:
  11.864000 seconds of total run time (11.736000 user, 0.128000 system)
NIL
  
* (time (PREGEXP-MATCH "([a-z][a-z]* *\\|* *)*foobar" "aaaaaa | bbbbbb | ccccccc | ddddddd" 0 NIL))
Evaluation took:
  23.696000 seconds of total run time (23.480000 user, 0.216000 system)
NIL

robert-dodier avatar Dec 04 '22 20:12 robert-dodier