black
black copied to clipboard
Format hex code in unicode escape sequences in string literals
Closes #2067 Closes #2828
Checklist - did you ...
- [X] Add a CHANGELOG entry if necessary?
- [X] Add / update tests if necessary?
- [X] Add new / update outdated documentation? -> n/a
diff-shades results comparing this PR (151195979f0e0811abda25afbc96261fdc079087) to main (4e3303fa08e030722d6fd4d7fe7b8d44ef98991c). The full diff is available in the logs under the "Generate HTML diff report" step.
╭──────────────────────── Summary ────────────────────────╮
│ 5 projects & 38 files changed / 290 changes [+145/-145] │
│ │
│ ... out of 2 363 850 lines, 11 046 files & 23 projects │
╰─────────────────────────────────────────────────────────╯
Differences found.
Hi @Shivansh-007, are you still able to and interested in working on this PR? If not, just lemme know and I'd be happy to pick it up!
Also the exact direction we'll be going towards is yet to be decided (here).
Yeah sure @ichard26, sorry have been busy with school, let me know if something about my code is unclear.
So it's been two months without any updates and that's because I'm not that interested on working on this PR to be honest. It's stale and I have a bunch of other things I'd like/need to work on first. In the interest of being a good maintainer by delegating tasks, I've remarked this PR as "up for grabs" (a term I stole from Python Discord's projects). Anyone who wants to pick up this PR and fix it up and finish it is totally welcome to.
I haven't looked at this PR enough to even know what needs to be done to get it review-ready, but I can think of these off the top of my head:
- Address merge conflicts
- Address review comments
- Specifically decide whether we want to reformat escapes in lowercase or uppercase
Once ready, please open a new PR and we'll be happy to review it. I'd encourage adding @Shivansh-007 as a co-author on your commits (just one is enough) though just to be nice :)
Up-for-grabs seems like a neat idea, nice 👍
I think no other maintainers have yet expressed their opinion about lower vs. upper case. @ichard26 one way or the other?
I brought this PR up to date, applied @ichard26's review suggestions, and fixed a few more things I noticed. I think this PR is now good to go unless we change our mind to go with uppercase (#2067).
I determined the legal characters in \N
escapes by doing something like [unicodedata.name(chr(i)) for i in range(65536)]
(but ignoring invalid characters) and taking the set of all characters in the output. The length of the names ranged from 3 to 83. However, \N
also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for 🐂. I manually verified that there are no one-character aliases.
However,
\N
also accepts aliases and I'm not sure how to get a list of all of those; the Python docs point to unicode.org/Public/14.0.0/ucd/NameAliases.txt but that doesn't include the "ox" alias for :ox:
"ox" is the base name for :ox: so it's returned by unicodedata.name()
.
Ah thanks, I should have gone past 65536 to include astral characters. That increases the length range from 2 to 88 but doesn't add more characters to the set of characters that appear in names.
Also the longest names are
In [13]: [n for n in names if len(n) > 80]
Out[13]:
['ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM',
'ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM',
'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND MIDDLE RIGHT TO LOWER CENTRE',
'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE',
'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE TO MIDDLE LEFT',
'BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE TO MIDDLE RIGHT',
'BOX DRAWINGS LIGHT DIAGONAL MIDDLE LEFT TO UPPER CENTRE TO MIDDLE RIGHT TO LOWER CENTRE',
'BOX DRAWINGS LIGHT DIAGONAL MIDDLE RIGHT TO UPPER CENTRE TO MIDDLE LEFT TO LOWER CENTRE']