ruffle
ruffle copied to clipboard
SWFv5 CJK text encoding samples (Shift-JIS, EUC-KR, Big5, GB2312)
SWFv5 text is encoded based on the system locale instead of using UTF-8. Ruffle always decodes SWFv5 text as Windows-1252, but this is not accurate behavior for non-English SWFs. When the SWF embeds a font that provides the necessary glyph for a DefineText field, there is still no problem because the glyphs are matched to their respective characters, since both were decoded the same (incorrect) way. But if Ruffle were to support selecting and copying text, this would expose the fact that Ruffle is decoding text incorrectly.
More importantly, when a v5 SWF has a DefineText field without a corresponding font that provides the needed glyphs, Ruffle renders the text using its fake device font, and the garbled text is exposed. Similarly, non-English text in DefineEditText fields displays as garbled mojibake.

I am well aware that this situation cannot be improved anytime soon. Even if Ruffle did decode the text correctly, its fake device font does not include the needed characters, so nothing would be displayed at all.
Let's also make a note of Adobe Flash Player's behavior. Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.
Here are sample SWFv5 files with text encoded as Shift-JIS, EUC-KR, Big5, or GB2312: SWFv5-CJK-samples.zip I've also exported text from some of the files to give you something to check against.
Viewing and exporting the text Unfortunately JPEXS Flash Decompiler does not correctly decode SWFv5 text, so exporting it the usual way makes the original text unrecoverable. But I did find a way to export text from v5 DefineEditText fields:
- In the normal view, find the text field that you want to export.
- Switch to the "Hex dump" view and find the same DefineEditText tag.
- Expand it and scroll all the way down to the initialText (string) field.
- Right-click it and click "save to file."
Once exported, you can open each file in Notepad++ and select the correct encoding from the menu. Or you can just open the file in your web browser, since they autodetect encoding very accurately.
Here's another Shift-JIS sample, provided by olux997 on Discord: handy.zip
JPEXS has added a feature to set the interpreted charset of SWFv5 (and lower) files. Just right-click the file on the left sidebar and choose "Change charset".
Another Shift-JIS sample from #9698: https://github.com/ruffle-rs/ruffle/files/10820434/Para2.zip
Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.
Maybe Ruffle could use https://crates.io/crates/chardetng to detect the encoding. This could work even in non-Shift JIS scenarios instead of using the system codepage. I think that would be an improvement without breaking existing SWF files.
chardetng is used by Firefox to detect the encoding of old HTML pages that assumed a system codepage without specifying it in the document. That seems similar to the problem Ruffle needs to solve.
It seems UTF-8 should also work in SWFv5 (sample):
trace("âäéèÔ");
trace("-----");
trace("早上好");
trace("-----");
trace("สวัสดีตอนเช้า");
In Flash Player, these strings are displayed as you'd expect, but in Ruffle they're all broken:
âäéèÔ
-----
早上好
-----
สวัสดีตà¸à¸™à¹€à¸Šà¹‰à¸²
Mike mentioned a TODO in #2636:
Add option for specifying encoding for SWFv5 files. (Currently defaults to WINDOWS-1252).
Huh that's surprising, I can confirm it works even in the Flash 5 authoring software. The SWF outputs the text correctly in Flash Player 32 (though not in the player that comes with Flash 5, as seen in the Output window)

On the other hand, I couldn't find any way to make a text field in Flash 5 that displays UTF-8 characters correctly in Flash Player 32, even if the data in the text field is correct (it's still interpreted with the system codepage when played).
Circling this back to Dora DORA Janken, I recently did some experimentation where I modified master so that the encoding_for_version function just unilaterally returned UTF-8. I also changed the source fonts and edited the tags in the SWF to use a Japanese-supporting font, but to no avail. The attached version of the SWF is unaltered. djk_0329.zip
Here is another error example: https://oldswf.com/game/71121 Text is not correct and invisible, but it can copy out. It shound decode as GB2312(included by GBK), not Windows-1252
ÄãÐÑÀ²£¡Äã½ñÔç´Óº£ÉÏÆ¯µ½Õâ¸ö
I try to change all Windows-1252 to GBK in project, it works fine for my games. maybe this need to provide a config param for user?
Another affected game: https://web.archive.org/web/20210113055439oe_/http://www.purple.dti.ne.jp/earth/flash/tower2.swf