randexp.js icon indicating copy to clipboard operation
randexp.js copied to clipboard

Cyrillic characters don't work

Open husainshabbir opened this issue 5 years ago • 7 comments

Regular expressions with cyrillic characters (e.g. [А-Я]{1,5}[а-я]{5,10}) don't work in the latest version. The last version it used to work in is 0.4.6.

This reproduces the issue: https://codesandbox.io/s/randexp-cyrillic-issue-2kcou

husainshabbir avatar Jan 14 '20 17:01 husainshabbir

The default range for sets includes only printable ASCII characters https://github.com/fent/randexp.js#default-range

you can change it with something like the following

RandExp.prototype.defaultRange.add(0, 65535);

or with instances

let randexp = new RandExp(/regex/);
randexp.defaultRange.add(0, 65535);

defaultRange was added so that the any (.) character set wouldn't generate characters most randexp users wouldn't expect. although, it's applied to all sets, even custom sets (/[a-f]), and negated sets (/[^\D]).

whether or not it's applied to custom sets is debatable, it does seem like unexpected behavior.

fent avatar Jan 17 '20 06:01 fent

I can understand the default range being used for any "open' sets, such as . and negated character classes, but "closed" sets should not be restricted by the default range in my opinion. I would consider the current behaviour a bug.

michaelficarra avatar Jan 17 '20 16:01 michaelficarra

IMO this should be left as it is. People only need a minute of time to check the docs to understand what's going on.

It makes no sense if I explicitly specify character range on Randexp and then see that my string does not follow the range I specified. A regular expression may come from anywhere; a Randexp instance is what I control and use and want my generated string to be in range of.

1valdis avatar Mar 11 '20 12:03 1valdis

I'm leaning towards @michaelficarra in that the default range should be respected for predefined sets, but for custom non-negated sets like in the OP (e.g. [А-Я]{1,5}[а-я]{5,10}), could ignore the default range

fent avatar Apr 07 '20 17:04 fent

Then why defaultRange is even needed, if some constructs in regexp could "override" it? As it stands, I'm sure that the string generated will have characters in the defined range only, no matter what's in the regexp. So for me this override of range by regexp feels more unintuitive than the OP issue.

1valdis avatar Apr 08 '20 06:04 1valdis

@1valdis That's ridiculous. If defaultRange is restricted to a through f, and I provide the regexp x, should it not produce anything? How about [x]? Or [xyz]? Or [x-z]? defaultRange should only affect "open" sets like . or [^a].

michaelficarra avatar Apr 08 '20 16:04 michaelficarra

@michaelficarra if it was restricted by someone to a-f then it was done on purpose. For a-f it can be easily found: letters for hexadecimal numbers. If there goes some y or z then it's gonna blow up. The regexp itself is not always something you write into code and control. The randexp.js instance however is. I believe explaining that as "the default range of generated characters applies to whole regexp" is also simpler and more consistent than "the default range applies only to 'open' sets and negated groups, but not for predefined ranges". And I don't understand what's the problem with one line of code randexp.defaultRange.add(0, 65535); if you want Chinese, Russian and others.

1valdis avatar Apr 09 '20 04:04 1valdis