pomsky icon indicating copy to clipboard operation
pomsky copied to clipboard

.NET: `\p{LC}` doesn't work, `.` and `\w` doesn't properly support Unicode

Open Aloso opened this issue 2 years ago • 0 comments

All identified problems (most have been addressed in Pomsky 0.10):

  • [x] .NET doesn't support code points (in hexadecimal notation) outside the BMP – must be converted to two UTF-16 surrogates
    • [x] make it work in string literals (e.g. '𐌰')
    • [x] make it work for hexadecimal code points above U+FFFF (e.g. U+10330) instead of producing an error
  • [ ] #89
  • [x] \pL as shorthand for \p{L} doesn't work
  • [x] \p{LC} doesn't work
    • [ ] polyfill?
  • [x] scripts and boolean properties don't work at all
  • [x] needs investigation to see if all blocks are supported
  • [x] check if block names are correctly normalized: underscores must be removed, but dashes preserved
  • [x] \v and \h aren't supported
  • [ ] #88
  • [x] need to check if backreferences like \80 are too high (doc)
  • [x] any further bugs may surface during fuzzing

To Reproduce

The regex-test crate ~~should be~~ was expanded to run .NET tests and run in CI (currently only on Ubuntu).

Expected behavior

.NET flavor works reliably, using unsupported features produces an error.

Aloso avatar Mar 16 '23 08:03 Aloso