block-site icon indicating copy to clipboard operation
block-site copied to clipboard

Replacing regex pattern tokens breaks semantics

Open olfek opened this issue 1 year ago • 17 comments

https://github.com/penge/block-site/blob/00d340a493a77603d40cfd01576f438676a12d55/src/helpers/find-rule.ts#L25

replace here means I cannot use regex meta sequence . - for example .{3,}

https://github.com/penge/block-site/blob/00d340a493a77603d40cfd01576f438676a12d55/src/helpers/find-rule.ts#L29

replace here means I cannot use regex quantifier ? - for example y?

https://github.com/penge/block-site/blob/00d340a493a77603d40cfd01576f438676a12d55/src/helpers/find-rule.ts#L30

replace here means I cannot use * with any token other than . - for example y*

olfek avatar Dec 01 '24 13:12 olfek

To correct this:

  • Remove usages of replace and apologize for breaking changes.

OR

  • Add an option to enable TRUE regex mode (which skips usages of replace) which defaults to false and therefore does not break existing rulesets.

olfek avatar Dec 01 '24 13:12 olfek

@penge What do you think?

olfek avatar Dec 01 '24 13:12 olfek

As it wasn't designed for full regular expressions, this change isn't a breaking change and doesn't require an apology.

Supporting ? and * (as described in README) offers a balance between rule flexibility and user-friendliness.


TRUE regex mode

Yes, this would be a great addition. How do you think we could implement it? Perhaps as lines formatted like regular expression literals?

/.../i

penge avatar Dec 02 '24 09:12 penge

As it wasn't designed for full regular expressions, this change isn't a breaking change and doesn't require an apology.

I meant IF you removed the replace calls, it would be a breaking change.

Regex literals as input is a great idea, but remember to add a regex mode toggle which defaults to false. 🙂

olfek avatar Dec 02 '24 09:12 olfek

meant IF you removed

Got it. That'd be the case, then.

remember to add a regex mode toggle

How about we keep ? and * as are for most users, and treat any new lines starting with / (or basically look like as regular expression literals) as regular expressions literals? No toggle would be needed.

Example:

/.*/watch\?v=.*/i     # and we could even add comments feature at the end of the lines
                      # that I could use to describe regular expressions
                      # or even group rules into sections

Would turn into:

const regex = new RegExp(String.raw`.*/watch\?v=.*`, "i");

And that would:

regex.test("https://www.youtube.com/watch?v=123");   // true
regex.test("https://www.youtube.com/WATCH?v=123");   // true

penge avatar Dec 02 '24 10:12 penge

@penge 👌

olfek avatar Dec 02 '24 11:12 olfek

@penge

https://github.com/penge/block-site/blob/00d340a493a77603d40cfd01576f438676a12d55/src/helpers/find-rule.ts#L19-L43

https://github.com/penge/block-site/blob/00d340a493a77603d40cfd01576f438676a12d55/src/helpers/make-rules.ts#L10-L23

  • The input regex should not be modified in anyway (no replace or removeProtocol).
  • Regex should be matched against the original url (including protocol) and not normalizedUrl.

If a regex to block starts with /, how do we indicate a regex to allow?

https://developer.mozilla.org/docs/Web/JavaScript/Guide/Regular_expressions#advanced_searching_with_flags

Maybe a custom flag such as + to indicate an allow rule?

olfek avatar Dec 02 '24 11:12 olfek

  • The input regex should not be modified in anyway (no replace or removeProtocol).
  • Regex should be matched against the original url (including protocol) and not normalizedUrl.

I agree. Basically, two approaches there would be:

A) user-friendly, some modification, ? and * support B) regex, no modification

how do we indicate a regex to allow?

We could use the semantics we already use, that would be putting ! in front:

!/.../        # exception, will be allowed

penge avatar Dec 02 '24 12:12 penge

This feature will certainly make this application very powerful.

olfek avatar Dec 02 '24 12:12 olfek

I'm thinking if a rule tester should go out with this feature, regex is sometimes quite tricky to get right.

olfek avatar Dec 02 '24 12:12 olfek

rule tester ... quite tricky to get right.

True. Let's have a tester then, to ensure it works as expected before visiting the site(s). Could be an input that accepts one or more URL(s) and it lists the rules that match them (if any; one URL can be matched by more than one rule; excluded from blocking should be also possible to test this way).

penge avatar Dec 02 '24 12:12 penge

@penge

I was thinking something simple like this

image

RED = will block GREEN = will not block

olfek avatar Dec 02 '24 12:12 olfek

I considered this. It can be confusing a bit because "RED" also means "doesn't work", and "GREEN" means "works."

What about excluded rules? The color meanings there would be opposite, "GREEN" as "allows", or "RED" as "does not apply."


Maybe the colors meaning could be more broader: RED = No match (either doesn't block or doesn't exclude) GREEN = Match (either blocks or excludes)

Note: It may be difficult to work around the new inputs and align them. The large input area may need to be split into separate lines.

penge avatar Dec 02 '24 13:12 penge

@penge

Maybe the colors meaning could be more broader: RED = No match (either doesn't block or doesn't exclude) GREEN = Match (either blocks or excludes)

This is confusing for me because: GREEN = Match, blocks which is a negative action, which we associate with the color RED. RED = No match, does not block is a positive action, which we associate with the color GREEN.

olfek avatar Dec 02 '24 13:12 olfek

@penge

Whatever color you decide to go with, I think you should have a key that states what each color means 🙂

olfek avatar Dec 02 '24 13:12 olfek

https://www.allaboutvision.com/en-gb/conditions/color-blindness/red-green-color-blindness/

olfek avatar Dec 02 '24 13:12 olfek

or, wherever icons or colors come short, just use TEXT 😄

"blocks" "does not block" "excludes" "does not exclude"

penge avatar Dec 02 '24 14:12 penge