python-string-utils icon indicating copy to clipboard operation
python-string-utils copied to clipboard

serval bugs i found

Open kroitlory opened this issue 3 months ago • 0 comments

Validation and Regex Compatibility Issues in python-string-utils

Summary

Several validation functions appear to reject inputs that are commonly accepted in standard formats. Property-based and targeted tests reveal at least three distinct issues:

  • Scientific notation support in is_number/is_decimal is incomplete (rejects uppercase E and negative exponents)
  • URL port parsing is overly restrictive (rejects single-digit ports)
  • Pangram detection is case-sensitive (fails uppercase-only pangrams)

Affected Code

  • Scientific notation regex: string_utils/_regex.py:7
  • URL regex port segment: string_utils/_regex.py:14
  • Pangram implementation: string_utils/validation.py:510-514
  • Number validation: string_utils/validation.py:135-138
  • Decimal validation: string_utils/validation.py:159-172

Environment

  • OS: Windows
  • Python: 3.10
  • Command: python run_added_tests.py

Expected Behavior

  • is_number and is_decimal should accept scientific notation using both lowercase and uppercase E, and should allow signed exponents (e.g., 1e-3, 1E+3, 1.5e-3).
  • is_url should accept 1–5 digit port numbers (standard range 0–65535), e.g., http://localhost:8.
  • is_pangram should be case-insensitive (uppercase-only pangrams should pass).

Actual Behavior

  • Scientific notation:
    • is_number('1E3') → False
    • is_number('1e-3') → False
    • is_decimal('1.5e-3') → False
  • URL port:
    • is_url('http://localhost:8') → False
    • is_url('http://127.0.0.1:7') → False
  • Pangram:
    • is_pangram('THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG') → False

Code References

  • Scientific notation regex currently: string_utils/_regex.py:7
    • NUMBER_RE = re.compile(r'^([+\-]?)((\d+)(\.\d+)?(e\d+)?|\.\d+)$')
    • Limitations:
      • Only lowercase e
      • Exponent requires digits only (\d+), no optional sign
  • URL port segment: string_utils/_regex.py:14
    • (:\d{2,})? requires at least two digits for the port
  • Pangram: string_utils/validation.py:510-514
    • Compares set of characters to string.ascii_lowercase without case normalization

Suggested Fixes

  • is_number / is_decimal:
    • Update NUMBER_RE to accept both e and E, and an optional sign before exponent digits, e.g.:
      • Allow pattern segment like [eE][+\-]?\d+
    • Ensure downstream checks (is_integer, is_decimal) keep consistent semantics when scientific notation is used.
  • is_url:
    • Relax port segment to (:\d{1,5})? and optionally validate numeric range (0–65535) outside the regex if desired.
  • is_pangram:
    • Normalize input to lowercase (or use a case-insensitive comparison) before computing set inclusion.

kroitlory avatar Nov 14 '25 12:11 kroitlory