grex
grex copied to clipboard
Non optimal generated regexp
Hi @pemistahl and many thanks for this great piece of software.
I'd like to report a little issue which I'm sure can easily be fixed.
$ grex --version
grex 1.4.5
$ cat bots.txt
baiduspider
bingbot
duckduckgo
googlebot
yandexbot
$ grex --no-anchors -c -i -f bots.txt
(?i)(?:baiduspider|duckduckgo|(?:google|bing)bot|yandexbot)
This is what i was expecting to get:
$ grex --no-anchors -c -i -f bots.txt
(?i)(?:baiduspider|duckduckgo|(?:google|bing|yandex)bot)
yandexbot
shares the same suffix bot
with googlebot
and bingbot
.
Interestingly, when testing with a reduced list of bots all sharing the same suffix, the suffix bot
is found but still a non optimal regex is returned:
$ cat bots.txt
bingbot
googlebot
yandexbot
$ grex --no-anchors -c -i -f bots.txt
(?i)(?:(?:google|bing)|yandex)bot
This is what i was expecting to get:
$ grex --no-anchors -c -i -f bots.txt
(?i)((?:google|bing|yandex)bot)
Many thanks