DiscordWikiBot
DiscordWikiBot copied to clipboard
EventStreams: Allow title matching using regular expressions
Some server owners have long requested adding ways to stream a number of defined pages using the bot.
I have thought before that the best way for doing this would be something like glob patterns, but this has multiple problems. For one, you would have to re-implement or take a library that is doing glob matching. There are also questions on whether it would be clashing with actual MediaWiki titles. After researching this question for a bit, I decided that just allowing people to use regular expressions (regexps) is good enough to solve this need.
Here are the theoretical requirements for any potential implementation:
- Regexps can be passed only to
--title
attribute of the configuration. - Regexps should be passed using
--title /.*/
syntax (i. e. always wrapped into//
), since this would keep the params to the minimum and introduce a simple way to tell what is a regexp and what is not (str.StartsWith('/')
). This needs to account for articles like https://en.wikipedia.org/wiki//b/ which are unlikely to have their own stream feeds but probably still need some way to reference them in EventStreams (e. g.:/b/
?). - The code should define a reasonable MatchTimeout (0.5 second?) and try/catch errors from slow regexps to prevent any ReDOS attacks.
- Passed regexps should be tested with the timeout and slow regexps should be rejected by the bot on the configuration step (
!openStream
). - Passed regexps should match the whole string for clarity (
^…$
) and should not ignore case. - (If we can find a way) Regexps should be as simple as possible in the number of features allowed.
There might be other notable things I forgot, please report them if you read the issue and can think of them.
Another idea: make --title-matches
key (name can be discussed) (--in-title
?) for --namespace
streams only for simplicity (makes it easier to process this and would require less changes to the current shaky structure of the code).