moebius icon indicating copy to clipboard operation
moebius copied to clipboard

Mixed character encodings caused by SAUCE

Open bengarrett opened this issue 3 years ago • 2 comments

I don't know if this is an intentional design, but ANS files created in Moebius can have mixed character encodings that technically could be considered corrupt?

While the drawing box matches the legacy character encoding determined by the font choice, using IBM VGA means the text encoding is CP-437. The SAUCE metadata accepts Unicode input and will also embed those into the CP-437 ANS file.

Should the SAUCE metadata (title, author, group, comments fields) use the same character encoding as the rest of the ANS document?

Here, I have drawn full block characters and then added utf8 blocks copied from the web into the SAUCE data.

input

When viewed in a utf8 terminal, the CP437 blocks are unknown.

Screenshot from 2020-10-29 12-37-32

When converted from CP437 the SAUCE blocks are malformed.

Screenshot from 2020-10-29 12-38-49

Real world example, at the last line. The left quotation is an ASCII compatible decimal 34 while the right is U+201D, a right double quotation. While the rest of the document is in CP-437. Charles Martin "Terminal ColloquyΓÇ¥

Screenshot from 2020-10-29 12-44-03

bengarrett avatar Oct 29 '20 02:10 bengarrett

I don't know if this is an intentional design, but ANS files created in Moebius can have mixed character encodings that technically could be considered corrupt? Should the SAUCE metadata (title, author, group, comments fields) use the same character encoding as the rest of the ANS document?

We have the SAUCE spec as a reference here, although it might not always be 100% clear. It specifically mentions in note 3 of the layout that prior to revision 00.5 Character fields were expected to be in CP437 but that other 'codepages' were used too along the way for both the file and SAUCE. So one could assume they need to or should be in the same encoding.

I don't think that prior to Moebius any other tool allowed UTF-8 (or any other non IBM codepage) in the SAUCE fields. Our options would either be to prevent or show a warning whenever a user enters non ASCII characters in those fields.

bart-d avatar Oct 29 '20 13:10 bart-d

As Moebius already features the Export As UTF-8 option. Maybe that could be the method of saving files that have any UTF-8 unique characters in the SAUCE fields? That warning could also be bought up, alerting the user and stating why the file can only be saved to UTF-8 and the negatives of doing so?

I would guess with SAUCE; they assumed you would not mix up codepages because technically you couldn't. The legacy 256 character encodings all share the same code points, so they couldn't mix-in outside characters. I think that's why the web eventually moved on from ASCII and ISO-8859-x over to UTF-8 because it was impossible to mix in langagues and say display 日本人, ไทย or русский on the same page.

bengarrett avatar Oct 30 '20 22:10 bengarrett