commonlib icon indicating copy to clipboard operation
commonlib copied to clipboard

rblib: Validate.uses_mixed_capitals should be made unicode-aware

Open mhl opened this issue 12 years ago • 1 comments

Currently uses_mixed_capitals uses the regular expressions /[A-Z]/ and /[a-z]/ to detect upper and lower case letters. This doesn't take into account non-ASCII upper case and lower case letters. In fixing this, case needs to be taken to preserve Ruby 1.8.7 compatibility, which doesn't have support for Unicode character classes in its regular expressions, meaning that one couldn't just use /[[:upper:]]/, for example.

mhl avatar Jan 02 '14 12:01 mhl

Elsewhere, Alaveteli uses literal character classes to fake Unicode character classes, e.g. here, although that's hugely incomplete. One could generate a correct character classes similarly, corresponding to [[:upper:]] and [[:lower:]], but there are over 1000 characters in each category, and they don't nicely collapse into ranges.

To see all upper and lower cases letters in Unicode, grouped into ranges of contiguous integers, you can use this script, which produces the output below.

Probably the pragmatic solution is to deal with the commonest ranges under Ruby 1.8.7 (checking they include those used by redeployers of our software) and use the POSIX character classes under Ruby 1.9 and later.

mhl avatar Jan 02 '14 13:01 mhl