joplin icon indicating copy to clipboard operation
joplin copied to clipboard

Desktop: when non latin languages applied, command palette displays incorrect names

Open graphit0 opened this issue 1 year ago • 7 comments

Operating system

Linux

Joplin version

2.14.12

Desktop version info

Joplin 2.14.12 (prod, linux)

Client ID: 358e8a1fa2c24626bcba79c5ac4955e3 Sync Version: 3 Profile Version: 45 Keychain Supported: No

Revision: 2881993

Bidirectional Links: 0.1.2 Calendar: 1.0.0-rc2 CodeMirror 6 snippets: 0.0.4 Combine notes: 1.2.2 Conflict Resolution: 1.2.3 Convert Text To New Note: 1.5.1 Enhancement: 1.2.1 Kanban: 1.0.7 Note list (Preview): 0.4.0 Outline: 1.5.13 Quick HTML tags: 0.2.0 Quick Links: 1.3.0 Send snippet to different note: 1.0.0 Simple Backup: 1.3.6 Templates: 2.4.0

Current behaviour

Description

when Joplin is set to non latin languages, command palette displays broken names for commands

screenshot

image

steps

  1. Change language to non-latin layout for example Bulgarian
  2. Navigate to command palette via Ctrl+shift+:
  3. Scroll to the bottom
  4. Observe broken names of commands

cast

https://github.com/laurent22/joplin/assets/44114323/2fbacc70-055d-43e4-b4fe-7e57489aa664

Notes

  • same behavior observed on Windows 10
  • if searched for specific command by its latin name (for example setTags), full names displays correctly
  • behavior tested in Greek and Cyrillic langs (UKR, RUS, SERB)

Expected behaviour

All commands to be displayed in user set language

graphit0 avatar Feb 07 '24 08:02 graphit0

I was looking at the problem and it seems to happen because packages/app-desktop/plugins/GotoAnything.tsx component calls the surroundKeywords with { escapeHtml: true } which is transforming in entities any character that is possible to transform.

https://github.com/laurent22/joplin/blob/627b8307399efe03f6b0dfe1a30e6544fe59494a/packages/lib/string-utils.ts#L235-L256

I don't understand the reason why is implemented like that, I would think that we would only want to escape reserved characters: https://developer.mozilla.org/en-US/docs/Glossary/Entity#reserved_characters

@laurent22 @personalizedrefrigerator do you think there is a reason why this is happening? Do you think it would be ok to make the change to use the escapeHtml function?

pedr avatar Feb 08 '24 21:02 pedr

Isn't it because we display it in HTML so it needs to be escaped?

laurent22 avatar Feb 09 '24 12:02 laurent22

Isn't it because we display it in HTML so it needs to be escaped?

I understand that chars like < > need to be escaped for security reasons, but characters like é ó ã ç can be displayed as they are, as far as I know. I'm going to research a little bit about this.

Changing to escapeHtml would fix the reported issue but the problem could still happen if any of the escape characters appeared in the search result (unlikely, but something to keep in might).

pedr avatar Feb 09 '24 17:02 pedr

but characters like é ó ã ç can be displayed as they are, as far as I know.

Yes but even then it doesn't really matter. &agrave; in HTML content would be displayed as à. The fact that we see html entities in there means something is escaped twice or some other problem.

laurent22 avatar Feb 09 '24 17:02 laurent22

but characters like é ó ã ç can be displayed as they are, as far as I know.

Yes but even then it doesn't really matter. &agrave; in HTML content would be displayed as à. The fact that we see html entities in there means something is escaped twice or some other problem.

Sorry, maybe I should have explained the issue better.

The reason why the user see the characters like this is because how the surroundKeywords works.

We are escaping the HTML string (each command phrase that is displayed in the command pallet) before executing the regex to include the prefix and suffix. The problem is that when we escape a character like é to &agrave; it breaks the function, see images below.

Example

The next image is "fixed" because of another issue that I fixed in the code, it works for content that doesn't have any keyword hightlights.

Screenshot from 2024-02-09 15-27-24

After search

image

The HTML that renders the first item image

The first thought would be to escape the character after the regex, but then we would escape also the prefix and suffix values that should not be escaped.

I think the best solution would be to escapeHTML when doing the regex reclamanet and around the prefix/suffix values, but I'm not sure if there is a straight way of doing this. I'm looking into it to see if I found a solution with this approach.

Writing it down again I realized that just using the escapeHtml function would also break the tags search:

Image

image

pedr avatar Feb 09 '24 18:02 pedr

Ok, I managed to fix this problem for tags and commands, but I found out that note titles are stripped of special characters in the command palette.

image

I think this is an artifact from the SearchEngine code, should I look into this or create another issue?

pedr avatar Feb 09 '24 20:02 pedr

I created another issue for this https://github.com/laurent22/joplin/issues/9919

pedr avatar Feb 09 '24 20:02 pedr