magic-regexp icon indicating copy to clipboard operation
magic-regexp copied to clipboard

Improved Character Range and Special Sequence Support

Open cdwmhcc opened this issue 8 months ago • 2 comments

🆒 Character Range Issues

Description Current Generated Pattern Expected Pattern
Number Range charIn('1-9') /[1\-9]/ /[1-9]/
Alternatives charIn('123456789') /[123456789]/ /[1-9]/
Ideal API charIn('1-9') n/a /[1-9]/

Whitespace Character Class Issues

Description Current Generated Pattern Expected Pattern
Escaped \s in String charIn('abc\\s') /[abc\\s]/ /[abc\s]/
Alternatives charIn('abc').or(whitespace) /(?:[abc]|\s)/ /[abc\s]/
Ideal API Option 1 charIn('abc\\s') n/a /[abc\s]/
Ideal API Option 2 charIn('abc${whitespace}') n/a /[abc\s]/

Complex Lookbehind or lookahead Structure Issues

Description Current Generated Pattern Expected Pattern
Lookbehind exactly('').after(anyOf(exactly('').at.lineStart(), charIn('-_(:')) /(?<=(?:^|[\-_(:]))/ /(?<=(?:^|[-_(:]))/
Ideal API after(anyOf(lineStart, charIn('-_(:')) n/a /(?<=(?:^|[-_(:]))/

ℹ️ Additional info

  1. Character Range Interpretation:

    • The library interprets '1-9' literally as the characters "1", "-", and "9" instead of the range from 1 to 9
    • Proper character ranges need to be enumerated manually
  2. Escaped Character Handling:

    • Escape sequences like \\s in strings are not correctly translated to regex character classes
    • The library creates unnecessary alternation when combining regular characters with special classes

Suggested Improvements

  1. Implement proper character range parsing in charIn(): - between two characters should create a range
  2. Support proper escape sequence handling in character classes
  3. Introduce more concise helper functions for common patterns (e.g., lineStart, after)

cdwmhcc avatar Mar 29 '25 08:03 cdwmhcc

what do you think of the implementation in https://github.com/unjs/magic-regexp/pull/399?

danielroe avatar Apr 02 '25 10:04 danielroe

what do you think of the implementation in #399?

Thanks for pointing me to PR #399. I initially misunderstood issue #397, thinking it was language-specific. After taking a closer look, I see that it addresses the same Character Range Issues I mentioned in my issue. Having reviewed PR #399, I can confirm that its implementation would indeed solve the Character Range Issues portion of my issue. I appreciate you making this connection.

cdwmhcc avatar Apr 02 '25 10:04 cdwmhcc