ruffle icon indicating copy to clipboard operation
ruffle copied to clipboard

SWFv5 CJK text encoding samples (Shift-JIS, EUC-KR, Big5, GB2312)

Open n0samu opened this issue 3 years ago • 7 comments
trafficstars

SWFv5 text is encoded based on the system locale instead of using UTF-8. Ruffle always decodes SWFv5 text as Windows-1252, but this is not accurate behavior for non-English SWFs. When the SWF embeds a font that provides the necessary glyph for a DefineText field, there is still no problem because the glyphs are matched to their respective characters, since both were decoded the same (incorrect) way. But if Ruffle were to support selecting and copying text, this would expose the fact that Ruffle is decoding text incorrectly.

More importantly, when a v5 SWF has a DefineText field without a corresponding font that provides the needed glyphs, Ruffle renders the text using its fake device font, and the garbled text is exposed. Similarly, non-English text in DefineEditText fields displays as garbled mojibake. image

I am well aware that this situation cannot be improved anytime soon. Even if Ruffle did decode the text correctly, its fake device font does not include the needed characters, so nothing would be displayed at all.

Let's also make a note of Adobe Flash Player's behavior. Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.

Here are sample SWFv5 files with text encoded as Shift-JIS, EUC-KR, Big5, or GB2312: SWFv5-CJK-samples.zip I've also exported text from some of the files to give you something to check against.

Viewing and exporting the text Unfortunately JPEXS Flash Decompiler does not correctly decode SWFv5 text, so exporting it the usual way makes the original text unrecoverable. But I did find a way to export text from v5 DefineEditText fields:

  1. In the normal view, find the text field that you want to export.
  2. Switch to the "Hex dump" view and find the same DefineEditText tag.
  3. Expand it and scroll all the way down to the initialText (string) field.
  4. Right-click it and click "save to file."

Once exported, you can open each file in Notepad++ and select the correct encoding from the menu. Or you can just open the file in your web browser, since they autodetect encoding very accurately.

n0samu avatar Oct 26 '22 01:10 n0samu

Here's another Shift-JIS sample, provided by olux997 on Discord: handy.zip

n0samu avatar Dec 06 '22 20:12 n0samu

JPEXS has added a feature to set the interpreted charset of SWFv5 (and lower) files. Just right-click the file on the left sidebar and choose "Change charset".

n0samu avatar Feb 24 '23 09:02 n0samu

Another Shift-JIS sample from #9698: https://github.com/ruffle-rs/ruffle/files/10820434/Para2.zip

n0samu avatar Feb 24 '23 18:02 n0samu

Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.

Maybe Ruffle could use https://crates.io/crates/chardetng to detect the encoding. This could work even in non-Shift JIS scenarios instead of using the system codepage. I think that would be an improvement without breaking existing SWF files.

chardetng is used by Firefox to detect the encoding of old HTML pages that assumed a system codepage without specifying it in the document. That seems similar to the problem Ruffle needs to solve.

mathewhodson avatar Mar 14 '23 03:03 mathewhodson

It seems UTF-8 should also work in SWFv5 (sample):

trace("âäéèÔ");
trace("-----");
trace("早上好");
trace("-----");
trace("สวัสดีตอนเช้า");

In Flash Player, these strings are displayed as you'd expect, but in Ruffle they're all broken:

âäéèÔ
-----
早上好
-----
สวัสดีตอนเช้า

Mike mentioned a TODO in #2636:

Add option for specifying encoding for SWFv5 files. (Currently defaults to WINDOWS-1252).

Toad06 avatar Apr 21 '23 13:04 Toad06

Huh that's surprising, I can confirm it works even in the Flash 5 authoring software. The SWF outputs the text correctly in Flash Player 32 (though not in the player that comes with Flash 5, as seen in the Output window) image

On the other hand, I couldn't find any way to make a text field in Flash 5 that displays UTF-8 characters correctly in Flash Player 32, even if the data in the text field is correct (it's still interpreted with the system codepage when played).

n0samu avatar Apr 21 '23 23:04 n0samu

Circling this back to Dora DORA Janken, I recently did some experimentation where I modified master so that the encoding_for_version function just unilaterally returned UTF-8. I also changed the source fonts and edited the tags in the SWF to use a Japanese-supporting font, but to no avail. The attached version of the SWF is unaltered. djk_0329.zip

sombraguerrero avatar May 24 '24 06:05 sombraguerrero

Here is another error example: https://oldswf.com/game/71121 Text is not correct and invisible, but it can copy out. It shound decode as GB2312(included by GBK), not Windows-1252

ÄãÐÑÀ²£¡Äã½ñÔç´Óº£ÉÏÆ¯µ½Õâ¸ö

Image

chenxuuu avatar Oct 12 '24 12:10 chenxuuu

I try to change all Windows-1252 to GBK in project, it works fine for my games. maybe this need to provide a config param for user?

chenxuuu avatar Oct 16 '24 03:10 chenxuuu

Another affected game: https://web.archive.org/web/20210113055439oe_/http://www.purple.dti.ne.jp/earth/flash/tower2.swf

Randomno avatar Apr 30 '25 04:04 Randomno