lime icon indicating copy to clipboard operation
lime copied to clipboard

Remove redundant file dialog string conversions

Open tobil4sk opened this issue 2 years ago • 9 comments

Currently, there are a lot of redundant string conversions when using the file dialog api. This was mentioned briefly in #1622.

Here is a rough overview of the current situation:

Linux/macOS:

  • C++: hxstring -> std::wstring -> std::string, passed into tinyfiledialogs utf-8 api, then std::string -> std::wstring -> hxstring
  • HL: hl_vstring -> std::wstring -> std::string, passed into tinyfiledialogs utf-8 api, then std::string -> std::wstring -> utf-8 bytes -> native utf-16 hashlink string (on Haxe side via String.fromUtf8())

Windows:

  • C++: hxstring -> std::wstring, passed into tinyfiledialogs utf-16 api, then std::wstring -> hxstring
  • HL: hl_vstring -> std::wstring, passed into tinyfiledialogs utf-16 api, then std::wstring -> utf-8 bytes -> native utf-16 hashlink string (on Haxe side via String.fromUtf8())

This cleans things up to avoid unnecessary conversions, so now it looks like this in most cases:

Linux/macOS:

  • C++: hxstring -> utf-8[0], passed into tinyfiledialogs utf-8 api, then utf-8 -> hxstring
  • HL: hl_vstring -> utf-8, passed into tinyfiledialogs utf-8 api, then utf-8 -> utf-16 hashlink string (on Haxe side via String.fromUtf8())

Windows:

  • C++: hxstring -> utf-16[0], passed into tinyfiledialogs utf-16 api, then utf-16 -> hxstring
  • HL: hl_vstring, passed into tinyfiledialogs utf-16 api, then utf-16 -> utf8 -> utf-16 hashlink string (on Haxe side via String.fromUtf8())

[0] hxcpp can use ASCII (which is compatible with utf-8) or utf-16 depending on the string, so these conversions are avoided in some cases.

We can improve Hashlink further (for Windows), by returning utf-16 strings from the apis, however, I'm not sure if this would be considered a breaking change (new lime.hdlls would be incompatible with old lime haxe files).

In #1622, we briefly discussed using the utf-8 tinyfiledialog functions on Windows to unify the api, however, I looked into this and on Windows these functions just convert back to utf-16, so there would still be extra conversions. So I think this is the cleanest solution.

I've tested on Linux and Windows with the code from #1622, and everything works well.

tobil4sk avatar Apr 18 '23 12:04 tobil4sk

Just a little correction here, HXCPP (with smart strings enabled) never uses UTF-8 for strings. It uses ASCII and when any character in a string doesn't fit within ASCII range then the entire string is converted to UTF-16. HXCPP, however, provides a function to convert it's strings to UTF-8.

nixbody avatar Apr 18 '23 12:04 nixbody

Yes, that's correct. However, an ASCII string is valid utf-8, so when using hxs_utf8 on an hxcpp string which is ASCII encoded, it just returns a pointer to the ASCII string without any conversion being necessary.

tobil4sk avatar Apr 18 '23 12:04 tobil4sk

I was wondering if you point that out :) I thought you might know, my apologies for stating the obvious.

nixbody avatar Apr 18 '23 12:04 nixbody

No worries, it's worth making the distinction :)

tobil4sk avatar Apr 18 '23 13:04 tobil4sk

There are a few more places where wstrings are converted to utf8 still. These include most of the system functions, though these are windows only, where wchar is equal to char16_t so hl_to_utf8/alloc_wstring can be used there. For System::GetDirectory, it can be modified to avoid wstrings.

The bigger issue is Font::GetFamilyName() in text/Font.cpp. Currently it assumes that wchar is 16 bits, which is only the case on Windows. Even if it used char16_t, the alloc_hxs_utf16() function returns an HxString, and I can't seem to figure out to turn that into a value.

tobil4sk avatar Apr 20 '23 14:04 tobil4sk