imgui icon indicating copy to clipboard operation
imgui copied to clipboard

Rendering issues with UTF-8 text that has been decomposed into a canonical form

Open skylersaleh opened this issue 1 year ago • 1 comments

Hello,

I've been having some issues getting UTF-8 strings from filenames that have been decomposed by the operating system(https://developer.apple.com/library/archive/technotes/tn/tn1150.html#UnicodeSubtleties) to display correctly in ImGui.

image

Notice the top item has "?" marks after accented letters and the Korean letters are decomposed into their composite radicals. Compared to the bottom string which is the same except in non-decomposed UTF-8.

The attached txt file shows the same UTF-8 string encoded in both of these ways, however only one of them will render correctly in ImGui.

utf8.txt

I believe this might be caused by ImGui not compounding these multi-codepoint characters into a single character and instead tries to render each codepoint separately. Is there anyway to resolve this?

Thanks,

-Sky

skylersaleh avatar Dec 31 '22 21:12 skylersaleh

Hello!

Dear ImGui's text shaping is extremely simple and doesn't handle composite characters like those. (See also https://github.com/ocornut/imgui/issues/4227 https://github.com/ocornut/imgui/issues/4922 https://github.com/ocornut/imgui/issues/1228 https://github.com/ocornut/imgui/issues/4943)

The easiest solution is probably to "recompose" the strings.

In the context of your app that might be as simple as performing the opposite replacements from the table linked by the documentation you shared. Although the phrasing implies that table doesn't include Hangul, so that might take extra effort.

Ideally though you might find a macOS API or a library to handle this. If you find one please let us know!

PathogenDavid avatar Dec 31 '22 22:12 PathogenDavid

Agree with David answer. It is not expected that Dear ImGui will handle that, for performance and complexity reason, but you can probably preprocess your strings with some library function. I'm not even sure what the transformation is called. Maybe this? https://www.gnu.org/software/libunistring/manual/libunistring.html#Normalization-of-strings https://www.gnu.org/software/libunistring/manual/libunistring.html#Composition-of-characters

I'll close this as it is out of scope, however if you solve your issue posting an answer here for reference would be helpful for others. Thank you!

ocornut avatar Jan 03 '23 18:01 ocornut

I ended up using the string conversion to UTF-8 NFC in the UTF8proc library (https://github.com/JuliaStrings/utf8proc), and it did the trick here.

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

skylersaleh avatar Jan 03 '23 19:01 skylersaleh

It would be nice if a similar routine was implemented in imgui so that a third party library was not necessary.

Unfortunately even that tiny utf8proc carry a 1.8 MB source file with large tables. Not sure how much binary data that account for but it's probably too large but such an unusual use case.

ocornut avatar Jan 03 '23 19:01 ocornut