PKHeX
PKHeX copied to clipboard
GSC Box names not decoded correctly
Describe the bug In Gold, Silver, and Crystal, there is a wider selection of characters that can be entered in Box names than in Trainer names and Pokémon nicknames. In English, the additional characters include é and lowercase letters preceded by an apostrophe.
PKHeX does not correctly render Box names for Gold, Silver, and Crystal if they use characters that cannot be entered in nicknames, because it uses the Generation 1 encoding for both Generation 1 and 2.
To Reproduce Example 1:
- When entering a Box name in Gold, Silver, and Crystal, switch to lowercase to access the additional characters.
- Name a box "Pokémon"
- In PKHeX, this Box's name displays as "Pok"
Example 2:
- When entering a Box name in Gold, Silver, and Crystal, switch to lowercase to access the additional characters.
- Name a box "I'm you're" (using the special combined apostrophe + letter characters)
- In PKHeX, this Box's name displays as "Iñm youòe"
Expected behavior For é, this character should display as é, and not cut off the rest of the box name. (Personally, I would suggest that invalid characters should display as � and should not prevent the rest of the string being decoded, so that even with the current encoding map it would display as "Pok�mon", but I can understand there may be good reasons to stop decoding at the first invalid character.)
For the apostrophe + letter characters, while they definitely shouldn't be rendering as the wrong characters, I have no suggestions for how to represent them as a single character. Especially since a standalone apostrophe followed by a lowercase letter is just as valid in a Box name as the combined character. (FWIW Bank just renders them as the letter without the apostrophe for the English ones, and for the French ones it doesn't even try and just prints just a space.)
Additional context The character encoding for characters that cannot appear in nicknames or Trainer names differs between Generation 1 and 2 (and between languages). PKHeX is currently assuming it can use the Generation 1 encoding for all Western Generation 1 and 2 games, but this assumption does not hold for Box names or Mail because of the characters available.
In other Western languages, you can type ß and various diacritic characters in Box names (the available characters depend on the language), with the German & French and Spanish & Italian character encodings being incompatible with both each other and their corresponding Generation 1 variants for many of these characters. Even é is located at different code points between Gen 1 and 2 (as evidenced by the issue above).
This almost certainly affects Mail too, since Mail has an even bigger keyboard than Box names, but I didn't specifically check it. Mail also has a different character set depending on the language of the game that wrote the Mail, with overlapping codepoints (e.g. French can type 'ç' in Mail, which is stored at the same code point as Á in Italian & Spanish, but all Western games except English GS can successfully read 'ç' if the Mail was typed in a French game).
Reference: Gold Name & Box Mail CharMap
Reference: Crystal Name & Box Mail CharMap
Note: Does not indicate languages other than English