Switch Unicode Escaping to a VSCode-like system
This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a technique for detecting ambiguous characters and relaxes the detection of combining marks.
Control characters are in addition detected as invisible in this implementation whereas they are not on monaco but this is related to font issues.
Close #19913
Signed-off-by: Andrew Thornton [email protected]
Screenshots




I've marked this as 1.17 but happy to move this to 1.18
The gitdiff tests will be decoupled from Unicode Escaping the After #19958
- #19958
I think it will make the tests simpler and easier, and more correct.
I've marked this as 1.17 but happy to move this to 1.18
Since we are rushing for 1.17, I moved this to v1.18 because I think we need more time to discuss here.
May I check, would this resolution stop me getting errors in the editor ( 1.17.0+rc1) when adding a space,
e.g.
text text (2020) [AU] I get the nasty yellow symbol because it does not like the space between (2020) and (AU] yet it is a space freshly issued from my space bar.
Other spaces, using the same method, are not upsetting things.
This is is not actually changing monaco, the editor, but instead copying the way it detects ambiguous unicode string into our file viewer.
I think you need to give us a screenshot of your problem though.
@silverwind are you able to take a look at this - this now matches vscode's behaviour.
I think you need to give us a screenshot of your problem though. Thank you for the response and my apologies for the delayed response, the alert seemed to get filed with promotions on Google. I found the thread when searching for a possible reason for the issue in my quite new installation.
In a document (default options, nothing "clever" done at least) I can't always reproduce it, in other words you can use spaces sometimes without a problem.
In the example below, I reopened the document, added the second line (line 2) and typed the same text with a regular space (from the space bar). You then got the square box and a hover mentioning unicode. This is displayed in the "ready" page too. I do not recall doing any cut and paste in the original line, and definitely not in the second line that I just added.

Does this help? Are you then able to steer me in a direction? Thanks.
@DarrenPIngram
if they're being detected by both the old code and the monaco editor, I can tell you that this means that those aren't spaces e.g. U+0020 but are something else like: (U+00A0). Thus far from being a bug - it's actually correctly informing you that you're not seeing what you think you're seeing!
If you click on the warning triangle you should get the escaped code point, or if you hover over the yellow box on monaco you should get something similar. (The new PR would replicate the editor situation so you would hover over the yellow box.)
Put an example up on try.gitea.io or check the actual file using xxd to find out what codepoints are actually in use.
Anyway, this is a discussion about this PR, so have you replicated this on with this PR?
Thanks. As explained I found the thread when trying to track the issue down. It does add a different unicode even when entering a regular space.
I do think it is something related to the autocomplete, i.e. you type read and it pops up.
I will file a new issue report. I am still finding my way with Gitea and have not figured out how to get past these roadblocks.
Thanks.
On Fri, 22 Jul 2022 at 12:22, zeripath @.***> wrote:
@DarrenPIngram https://github.com/DarrenPIngram
if they're being detected by both the old code and the monaco editor, I can tell you that this means that those aren't spaces e.g. U+0020 but are something else like: (U+00A0). Thus far from being a bug - it's actually correctly informing you that you're not seeing what you think you're seeing!
If you click on the warning triangle you should get the escaped code point, or if you hover over the yellow box on monaco you should get something similar. (The new PR would replicate the editor situation so you would hover over the yellow box.)
Put an example up on try.gitea.io or check the actual file using xxd to find out what codepoints are actually in use.
Anyway, this is a discussion about this PR, so have you replicated this on with this PR?
— Reply to this email directly, view it on GitHub https://github.com/go-gitea/gitea/pull/19990#issuecomment-1192367518, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV5MAE26PYKLCUSRJINICLVVJR5FANCNFSM5ZAFKYLQ . You are receiving this because you were mentioned.Message ID: @.***>
The ambiguous/ambiguous.json are likely double-encoded.
"{\"_common\":[8232,32,8233...
Is it by purpose?
Not sure whether the
<code>line could be improved.
I'm just looking at it... I think we can just use a template.
And one more thing, not sure whether the double-encoded
ambiguous/ambiguous.jsonis done by purpose:"{\"_common\":[8232,32,8233...(why double-encoded)maybe the encoded JSON could be
{"_common": [8232,32,8233...
This file is directly taken https://github.com/hediet/vscode-unicode-data/blob/main/out/ambiguous.json
~~Don't merge just yet~~ OK I've done some more extensive subtemplating.
What exactly prevents this PR from being merged? I see 4 approvals, and no open comments...
I've bitten the bullet and merged it. There will be some complaints.
Here come the complaints! This commit breaks the routers/web/repo/commit.go:Diff route, causing it to return a 500 internal server error, because of locale being undefined in templates/repo/diff/escape_title.tmpl file
Apologies this is precisely why I was reluctant to write subtemplates in the first place.
We should really refactor .locale to a global helper locale, it's just a pain in the ass to have this function passed down as "data" when it doesn't have to be. Makes subtemplating a lot harder.
@silverwind well the problem here is that you can't do that (easily) with go templates because there is no way of setting Funcs on a per execution basis and the Funcs are stored in the templates themselves.
I've put a proposal up to go itself begging them to add methods to allow passing of execution templates (even providing a PR to do it.)
The current situation is absolutely intolerable though.
I don't think there is any better way of describing it than that. What we're being forced to do is incredibly cumbersome and extremely error prone.
If they reject this proposal and persist in suggesting Clone Funcs Execute - which just isn't feasible or performant - or suggest that we have to continue passing contextual rendering functions as data we should reconsider our use of this package.
We could come with a common passing system, e.g. always have common things in common (or root or context), so that locale is always in $.comon.locale in every template (even non subtemplates) but this is still ugly and really quite poor.
If we can't or won't tolerate that then our options then are to either fork the packages, to either add the proposed methods or to properly separate out the template collection, a template, the parser and the executor, (This would allow us to make a potentially lock free executor and template collection - or even leverage code generation to precompile templates to code).
Another option is to find a different template language. (Hopefully one shared on the frontend too.)
We're going to hit this problem more and more though as we try to make our templates easier to read and to handle.
because there is no way of setting Funcs on a per execution basis and the Funcs are stored in the templates themselves.
You mean helper functions are not really "global" as I thought? At least in original Handlebars (registerHelper) they are, so every subtemplate has access to all of them.
In some ways they're too global.
They're stored in the templates so if you want to change the func to have a per execution function you have to Clone the template at time of execution then set a new funcmap with the required Funcs and then Execute. This is expensive precisely because Funcs are stored in templates and they have to be concurrency safe.
If you don't do the clone first off you're going to change the templates for everyone using them.
What we need is something like Template.ExecuteFuncMap where you could pass in a map of functions specific for this execution that would override the currently template defined functions but I just don't know if it is going to be accepted.
Can this feature be "switched off" somehow? I get the warning box on several pages in a wiki, but the escape button neither produces the triangles nor marks which characters it finds offending, so I've no idea what (if anything) to clean up. Maybe this issue exists only on Wiki pages: I found one other file in a "normal repo" (i.e. not wiki) which had the warning box, and there escaping worked.
In all the cases I found however, those characters were intended (non-breakable spaces, single quotes and such). So that box is rather confusing or at best annoying there in my cases (I understand it can be quite useful in other cases). So there should be a way to disable it either per install, per project or both.
Btw: I only noticed that box when I upgraded to 1.18.0, it was not there before in case this matters. And yes, for my local instance I certainly could simply disable it via a CSS rule:
div.error.unicode-escape-prompt, div.warning.unicode-escape-prompt { display:none !important; }
But one might wish to toggle it on/off easily from within the browser, so checking remains possible when required/intended.