stringbench
stringbench copied to clipboard
Also benchmark mismatches
Currently the naive algorithm seems to perform quite well for large alphabets and small patterns. There are some reasons for that:
- small patterns means that backtracking cost is low
- most patterns to be searched can be found in the text
- large alphabets lead to early backtracking (most break at the first or second character)
To get a more balanced benchmark, we should modify it by
- inserting mismatching patterns
- inserting patterns that mismatch at the last pattern position