grex icon indicating copy to clipboard operation
grex copied to clipboard

Non optimal generated regexp

Open Zabrane opened this issue 3 weeks ago • 0 comments

Hi @pemistahl and many thanks for this great piece of software.

I'd like to report a little issue which I'm sure can easily be fixed.

$ grex --version                                                                                                                                                                                                       
grex 1.4.5

$ cat bots.txt
baiduspider
bingbot
duckduckgo
googlebot
yandexbot

$ grex --no-anchors -c -i -f bots.txt
(?i)(?:baiduspider|duckduckgo|(?:google|bing)bot|yandexbot)

This is what i was expecting to get:

$ grex --no-anchors -c -i -f bots.txt
(?i)(?:baiduspider|duckduckgo|(?:google|bing|yandex)bot)

yandexbot shares the same suffix bot with googlebot and bingbot.

Interestingly, when testing with a reduced list of bots all sharing the same suffix, the suffix bot is found but still a non optimal regex is returned:

$ cat bots.txt
bingbot
googlebot
yandexbot

$ grex --no-anchors -c -i -f bots.txt
(?i)(?:(?:google|bing)|yandex)bot

This is what i was expecting to get:

$ grex --no-anchors -c -i -f bots.txt
(?i)((?:google|bing|yandex)bot)

Many thanks

Zabrane avatar Jun 12 '24 14:06 Zabrane