keeperfx icon indicating copy to clipboard operation
keeperfx copied to clipboard

UTF-8 support

Open AdamPlenty opened this issue 1 year ago • 2 comments

Related issue: #2330

The other week, the media were praising KeeperFX on bringing Dungeon Keeper to the modern age, but one area in which even KeeperFX remains stuck in the 90s is character encoding. This is especially problematic for quick messages in level scripts; they are code page-dependent, which can lead to issues:

image image image

We need a universal code page (such as UTF-8) and font, not just for quick messages, but for the entire game. Otherwise, this'll continue to be an issue. With a universal system, anyone would be able to use any text, and it'll work regardless of language.

AdamPlenty avatar Dec 27 '23 08:12 AdamPlenty

Unicode support is a topic I have looked in to for my own interests, in particular the question:

"is it feasible for a program to support all languages with the distributed binaries alone?"

The short answer is "not today". The longer answer is "not without both substantial work and binary size". As far as I am aware it has never been achieved by anyone.

What follows is my musings on approaching an answer to the question:

"if KeeperFX supports Unicode and UTF-8 encoding, does it improve the support of the currently supported languages, and does it make adding new languages easier?"

I have no definitive answer, I would suggest that it would be quite some work to answer this definitively! So either answer the question definitively to decide if Unicode should be supported, or implement Unicode to find out if it was worth the effort ;p

To implement Unicode in FX we would need to:

  • use Unicode characters universally for all strings in FX ("bolting on" Unicode will make this an eternally painful feature for FX developers)
  • support UTF-8 encoding for "text files" (i.e. all strings used by FX that are external to the FX codebase)
  • map our existing fonts to UTF-8
  • either drop support for the codepages we do support: cp850, SJIS etc - or add functionality to translate the codepages we do support to Unicode.

Then, additional Unicode characters could be added as glyphs in new or existing FX fonts - this is manageable so long as each language is handled one at a time.

Map makers would still need to save their work correctly when working in external programs though (i.e. don't save with the wrong encoding). However, if we switch all shipped files to UTF-8 encoding, then for anyone who copies/edits the shipped files: "all characters supported by FX" will be rendered in game. This presumes that FX will have enough context to determine the language used (see below), which in most cases will be covered by the player's FX language setting and the map maker being "language aware" when working - but there may be edge cases that lead to undesirable results for the player.

See https://en.wikipedia.org/wiki/Open-source_Unicode_typefaces for why implementing Unicode and UTF-8 support doesn't magically support all languages. The summary is:

  • there are more characters already in Unicode than can fit in a single font file
  • Different languages can write the "same character" in different ways
  • there have been numerous attempts to create a complete "Unicode type face", but as far as I am aware, none include all 149813 Unicode characters. I recommended looking at "Kurinto Font Folio" as an example of near-complete Unicode support in a collection of fonts.

Hypothetical example: A map maker editing a file in Notepad++ with UTF-8 encoding will be able to see all the characters supported by the font they are using in Notepad++. A player playing "UTF-8 KFX" will be able to see all the characters supported by our font. In both Notepad++ and KFX unsupported characters will not be rendered as intended. So, given that KeeperFX supports a finite list of languages, it needs to be the case that ALL unsupported characters are from languages that are not in the list of supported languages - I believe that is the current status quo of master (i.e. achieved via codepages).

To summarise:

  • Unicode is a lot of work upfront, and further work would be required per language added to FX
  • Map makers will always have to save their work in the correct encoding (but switching to UTF-8 encoding as the "target platform" of FX may help to alleviate that)
  • FX needs a corresponding font glyph for each character that we would intend to support, regardless of the encoding(s) we do support

I'd also add that Unicode is only needed when you want to be able to display different languages on the screen at the same time from the same data source (and the characters for all of the languages you do want to show do not exist inside a single codepage). So another question that arises is:

"in what instances does FX need to use Unicode?"

I do not know very much about the language handling/use in KeeperFX, so I'm unable to answer this question.

eddebaby avatar Dec 27 '23 12:12 eddebaby

Quick messages in level scripts definitely ought to be in UTF-8, because players would want it to work even if the language is set to Japanese or whatever. As it is, we can forget about having quick messages in a language other than English and expecting them to work in all languages, because the game simply won't display the correct characters; for example, if the language is Japanese, everything (including quick messages) is interpreted as code page 932, which can result in text corruption, as seen above. This is because quick messages can only be in one code page, and they are not language-dependent; they are written directly in the script and just are what they are. The same goes for all languages and code pages.

One workaround might be to have level-specific text dat files for each language, like we do with map packs and campaigns, but even that is less than ideal, because the same text would need to be encoded in all code pages (we'd need several dat files for the same text even if it's not translated), and I'm not sure the East Asian font even supports letters with diacritical marks.

AdamPlenty avatar Dec 28 '23 02:12 AdamPlenty