ECMA 262: \d should only match ASCII digits
Given this pattern ^\d$
This should match: 0
And this should not: ߀
@fdutton on JRuby we behave as you describe. So something with our encodings will not match ߀ but does match 0. I am guessing you are using joni as a Java library so perhaps there is something config/call-wise which does behave this way?
Any extra info and we can try and figure out why we work and if we really are working how we get that result.
It looks like Ruby(JRuby) restricts numerics to only be ASCII explicitly: https://github.com/jruby/joni/blob/master/src/org/joni/Syntax.java#L459
I'll write some unit-tests but this is what I am doing to work around the issue.
// Joni is too liberal on some constructs
String s = regex
.replace("\\d", "[0-9]")
.replace("\\D", "[^0-9]")
.replace("\\w", "[a-zA-Z0-9_]")
.replace("\\W", "[^a-zA-Z0-9_]")
.replace("\\s", "[ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]")
.replace("\\S", "[^ \\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]");
byte[] bytes = s.getBytes(StandardCharsets.UTF_8);
this.pattern = new Regex(bytes, 0, bytes.length, Option.NONE, UTF8Encoding.INSTANCE, Syntax.ECMAScript);
@fdutton I don't know where oniguruma repo is but you could check to see if syntax for ECMAScript was updated "up stream". We tend to look at the onigmo fork using by C Ruby but we are pretty far down stream. Perhaps there is a more up to date syntax?
@enebo I think we are still on par wrt regexp functionality. We've been tracking https://github.com/k-takata/Onigmo/graphs/contributors and there's not a lot of activity there. There's been more changes in MRI codebase lately though.
There also doesnt seem to be ecma syntax in neither Onigmo or MRI repository.