xray-16
xray-16 copied to clipboard
UTF-8 support as XML files' encoding
Is your feature request related to a problem? Please describe.
Pretty much all modern environments fall back to unicode these days so "Windows-1251" encoding has to be defined explicitly for XMLs engine uses, which is annoying (especially when working from non-Windows OS).
Describe the solution you'd like
Unless XML starts with <?xml version="1.0" encoding="windows-1251"?>
or has another encoding attribute's value (are any other even supported?) treat file as encoded in UTF-8.
Describe alternatives you've considered
Working with files as UTF-8 in IDE then using iconv to convert them into encoding game engine understands (Windows-1251).
Additional context
I am partaking in effort for localisation patches of True Stalker in the community repo and have faced the garbled text display issues when submitted XMLs have been in UTF-8 initially.
Thanks in advance!
Not sure if this can be considered a duplicate: https://github.com/OpenXRay/xray-16/issues/419 Here emphasis is on gamedata XML files' encoding.
One thing I can add at this moment is that encoding="windows-1251"
cannot be trusted.
XMLs for different localizations are encoded with different encodings, e.g. Polish uses Windows-1250, but encoding
attribute in all XML files is set to windows-1251
always.. Basically, this attribute is not used in the engine anyway, so probably that's why it's not properly set and just abandoned.
Not sure if this can be considered a duplicate: https://github.com/OpenXRay/xray-16/issues/419
Could be considered as subtask :)
Unless XML starts with or has another encoding attribute's value (are any other even supported?) treat file as encoded in UTF-8.
So, given my message above, it's more safe to do the reverse – treat file as UTF-8 if encoding
attribute is set to utf-8
.
Riiight, now it suddenly started to make sense to me (sort of). As I opened XMLs in VS code and it was 50/50: either it opened correctly or not. So 1251 is not the only encoding supported, I see? I tried UTF-8 in declaration tag + saved file with UTF-8 encoding only to get a bunch of nonsense symbols in the game visually so my uneducated guess was it does not support UTF-8. Now I assume declaration tag is not used anywhere but these encodings (like 1251 for Russian, 1250 for Polish etc.) are stored somewhere else, right?
Where can I get that info from?
Or, if you already know what they are, could you just list what locale uses what enc: Russian, Ukrainian ― 1251 Polish ― 1250 English ― 1252 (?) German ― ? Spanish ― ? French ― ? Italian ― ? (And will I ruin everything by editing declaration lines to have proper encoding names or that line is basically a shebang to represent XMLs that contain translation data?)
Sorry if that is too much text 😅