Limnoria icon indicating copy to clipboard operation
Limnoria copied to clipboard

SedRegex: add configuration option for regex delimiter

Open baldurmen opened this issue 10 months ago • 3 comments

Hi,

At the moment, the SedRegex plugins supports a lot of different delimiters by default (", #, /, etc.) but also comes with ' hardcoded.

Sadly, this causes a lot of false-positives in the French channels I'm in, as sentences like s'en aller en bateau, c'est intéressant et agréable will cause a match because of the two apostrophes used. Indeed, this is seen as the same as s/en aller en bateau, c/est intéressant et agréable/, which isn't likely to be a match :P

Would it be possible to add a configuration option to either allowlist wanted delimiters (which IMO makes more sense) or to blocklist some (if you prefer this option).

Cheers!

baldurmen avatar Dec 17 '24 18:12 baldurmen

@pollo told me the logic behind this is in https://github.com/progval/Limnoria/blob/master/plugins/SedRegex/constants.py and i believe this specific instance can be fixed by adding ' to the exclusion, with [^\w\s'] instead of [^\w\s].

anarcat avatar Dec 17 '24 18:12 anarcat

I would think a list of disallowed separators makes more sense. The actual sed implementation allows all characters, even letters and spaces though they probably aren't as useful. I don't want to be overly restrictive by default, as it's much easier to pick an alternate separator than the usual "/" if your text includes that character.

$ sed 's t b ' <<< test
best
$ sed 'sataba' <<< test
best

jlu5 avatar Dec 18 '24 02:12 jlu5

we could make it configurable while keeping [^\w\s] as the default

progval avatar Dec 19 '24 22:12 progval