plyplus Support unicode in tokens

Support unicode in tokens

Open Smattr opened this issue 9 years ago • 1 comments

I grappled with supporting this in _unescape_unicode_in_token before deciding it was simpler to just drop this function and support unicode literals. A side-effect of this is that we've lost support for unicode and byte escapes (\u.... and \x..). Going back and forth with various different attempts here, I've formed the opinion that it doesn't seem straightforward to support both without more brittle regexing.

What are your thoughts about this? I can understand these changes might be unacceptable because they remove some previous functionality. However, my thinking was that users are unlikely to be working with unicode tokens unless they have an environment that supports unicode literals in the grammar source.

I can also add more test cases if there's anything you can think of that might be particularly tricky.

May 16 '15 06:05 Smattr

Unicode tokens in the input grammar are no longer necessary for our use case, so I'm happy to close this PR if no one else needs this.

Oct 09 '15 01:10 Smattr

plyplus plyplus copied to clipboard

Support unicode in tokens

plyplus
plyplus copied to clipboard