plyplus
plyplus copied to clipboard
Support unicode in tokens
I grappled with supporting this in _unescape_unicode_in_token
before deciding it was simpler to just drop this function and support unicode literals. A side-effect of this is that we've lost support for unicode and byte escapes (\u....
and \x..
). Going back and forth with various different attempts here, I've formed the opinion that it doesn't seem straightforward to support both without more brittle regexing.
What are your thoughts about this? I can understand these changes might be unacceptable because they remove some previous functionality. However, my thinking was that users are unlikely to be working with unicode tokens unless they have an environment that supports unicode literals in the grammar source.
I can also add more test cases if there's anything you can think of that might be particularly tricky.
Unicode tokens in the input grammar are no longer necessary for our use case, so I'm happy to close this PR if no one else needs this.