joplin
joplin copied to clipboard
Desktop: when non latin languages applied, command palette displays incorrect names
Operating system
Linux
Joplin version
2.14.12
Desktop version info
Joplin 2.14.12 (prod, linux)
Client ID: 358e8a1fa2c24626bcba79c5ac4955e3 Sync Version: 3 Profile Version: 45 Keychain Supported: No
Revision: 2881993
Bidirectional Links: 0.1.2 Calendar: 1.0.0-rc2 CodeMirror 6 snippets: 0.0.4 Combine notes: 1.2.2 Conflict Resolution: 1.2.3 Convert Text To New Note: 1.5.1 Enhancement: 1.2.1 Kanban: 1.0.7 Note list (Preview): 0.4.0 Outline: 1.5.13 Quick HTML tags: 0.2.0 Quick Links: 1.3.0 Send snippet to different note: 1.0.0 Simple Backup: 1.3.6 Templates: 2.4.0
Current behaviour
Description
when Joplin is set to non latin languages, command palette displays broken names for commands
screenshot
steps
- Change language to non-latin layout for example Bulgarian
- Navigate to command palette via
Ctrl+shift+:
- Scroll to the bottom
- Observe broken names of commands
cast
https://github.com/laurent22/joplin/assets/44114323/2fbacc70-055d-43e4-b4fe-7e57489aa664
Notes
- same behavior observed on Windows 10
- if searched for specific command by its latin name (for example
setTags
), full names displays correctly - behavior tested in Greek and Cyrillic langs (UKR, RUS, SERB)
Expected behaviour
All commands to be displayed in user set language
I was looking at the problem and it seems to happen because packages/app-desktop/plugins/GotoAnything.tsx
component calls the surroundKeywords
with { escapeHtml: true }
which is transforming in entities any character that is possible to transform.
https://github.com/laurent22/joplin/blob/627b8307399efe03f6b0dfe1a30e6544fe59494a/packages/lib/string-utils.ts#L235-L256
I don't understand the reason why is implemented like that, I would think that we would only want to escape reserved characters: https://developer.mozilla.org/en-US/docs/Glossary/Entity#reserved_characters
@laurent22 @personalizedrefrigerator
do you think there is a reason why this is happening? Do you think it would be ok to make the change to use the escapeHtml
function?
Isn't it because we display it in HTML so it needs to be escaped?
Isn't it because we display it in HTML so it needs to be escaped?
I understand that chars like < > need to be escaped for security reasons, but characters like é ó ã ç can be displayed as they are, as far as I know. I'm going to research a little bit about this.
Changing to escapeHtml
would fix the reported issue but the problem could still happen if any of the escape characters appeared in the search result (unlikely, but something to keep in might).
but characters like é ó ã ç can be displayed as they are, as far as I know.
Yes but even then it doesn't really matter. à
in HTML content would be displayed as à
. The fact that we see html entities in there means something is escaped twice or some other problem.
but characters like é ó ã ç can be displayed as they are, as far as I know.
Yes but even then it doesn't really matter.
à
in HTML content would be displayed asà
. The fact that we see html entities in there means something is escaped twice or some other problem.
Sorry, maybe I should have explained the issue better.
The reason why the user see the characters like this is because how the surroundKeywords works.
We are escaping the HTML string (each command phrase that is displayed in the command pallet) before executing the regex to include the prefix and suffix. The problem is that when we escape a character like é
to à
it breaks the function, see images below.
Example
The next image is "fixed" because of another issue that I fixed in the code, it works for content that doesn't have any keyword hightlights.
After search
The HTML that renders the first item
The first thought would be to escape the character after the regex, but then we would escape also the prefix and suffix values that should not be escaped.
I think the best solution would be to escapeHTML when doing the regex reclamanet and around the prefix/suffix values, but I'm not sure if there is a straight way of doing this. I'm looking into it to see if I found a solution with this approach.
Writing it down again I realized that just using the escapeHtml
function would also break the tags search:
Image
Ok, I managed to fix this problem for tags and commands, but I found out that note titles are stripped of special characters in the command palette.
I think this is an artifact from the SearchEngine code, should I look into this or create another issue?
I created another issue for this https://github.com/laurent22/joplin/issues/9919