Limnoria
Limnoria copied to clipboard
SedRegex: add configuration option for regex delimiter
Hi,
At the moment, the SedRegex plugins supports a lot of different delimiters by default (", #, /, etc.) but also comes with ' hardcoded.
Sadly, this causes a lot of false-positives in the French channels I'm in, as sentences like s'en aller en bateau, c'est intéressant et agréable will cause a match because of the two apostrophes used. Indeed, this is seen as the same as s/en aller en bateau, c/est intéressant et agréable/, which isn't likely to be a match :P
Would it be possible to add a configuration option to either allowlist wanted delimiters (which IMO makes more sense) or to blocklist some (if you prefer this option).
Cheers!
@pollo told me the logic behind this is in https://github.com/progval/Limnoria/blob/master/plugins/SedRegex/constants.py and i believe this specific instance can be fixed by adding ' to the exclusion, with [^\w\s'] instead of [^\w\s].
I would think a list of disallowed separators makes more sense. The actual sed implementation allows all characters, even letters and spaces though they probably aren't as useful. I don't want to be overly restrictive by default, as it's much easier to pick an alternate separator than the usual "/" if your text includes that character.
$ sed 's t b ' <<< test
best
$ sed 'sataba' <<< test
best
we could make it configurable while keeping [^\w\s] as the default