xray-16 icon indicating copy to clipboard operation
xray-16 copied to clipboard

UTF-8 support as XML files' encoding

Open lehrax opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe.

Pretty much all modern environments fall back to unicode these days so "Windows-1251" encoding has to be defined explicitly for XMLs engine uses, which is annoying (especially when working from non-Windows OS).

Describe the solution you'd like

Unless XML starts with <?xml version="1.0" encoding="windows-1251"?> or has another encoding attribute's value (are any other even supported?) treat file as encoded in UTF-8.

Describe alternatives you've considered

Working with files as UTF-8 in IDE then using iconv to convert them into encoding game engine understands (Windows-1251).

Additional context

I am partaking in effort for localisation patches of True Stalker in the community repo and have faced the garbled text display issues when submitted XMLs have been in UTF-8 initially.

Thanks in advance!


Not sure if this can be considered a duplicate: https://github.com/OpenXRay/xray-16/issues/419 Here emphasis is on gamedata XML files' encoding.

lehrax avatar Jan 03 '24 10:01 lehrax

One thing I can add at this moment is that encoding="windows-1251" cannot be trusted. XMLs for different localizations are encoded with different encodings, e.g. Polish uses Windows-1250, but encoding attribute in all XML files is set to windows-1251 always.. Basically, this attribute is not used in the engine anyway, so probably that's why it's not properly set and just abandoned.

Not sure if this can be considered a duplicate: https://github.com/OpenXRay/xray-16/issues/419

Could be considered as subtask :)

Xottab-DUTY avatar Jan 03 '24 11:01 Xottab-DUTY

Unless XML starts with or has another encoding attribute's value (are any other even supported?) treat file as encoded in UTF-8.

So, given my message above, it's more safe to do the reverse – treat file as UTF-8 if encoding attribute is set to utf-8.

Xottab-DUTY avatar Jan 03 '24 11:01 Xottab-DUTY

Riiight, now it suddenly started to make sense to me (sort of). As I opened XMLs in VS code and it was 50/50: either it opened correctly or not. So 1251 is not the only encoding supported, I see? I tried UTF-8 in declaration tag + saved file with UTF-8 encoding only to get a bunch of nonsense symbols in the game visually so my uneducated guess was it does not support UTF-8. Now I assume declaration tag is not used anywhere but these encodings (like 1251 for Russian, 1250 for Polish etc.) are stored somewhere else, right?

Where can I get that info from?

Or, if you already know what they are, could you just list what locale uses what enc: Russian, Ukrainian ― 1251 Polish ― 1250 English ― 1252 (?) German ― ? Spanish ― ? French ― ? Italian ― ? (And will I ruin everything by editing declaration lines to have proper encoding names or that line is basically a shebang to represent XMLs that contain translation data?)

Sorry if that is too much text 😅

lehrax avatar Jan 03 '24 15:01 lehrax