encoding_rs
encoding_rs copied to clipboard
Make encoding lists public
I recently migrated Stringsext, a GNU Strings Alternative with Multi-Byte-Encoding Support from rust-encoding to encoding_rs.
The Stringsext tool prints la list of supported encoding names. As the lists in encoding_rs
are not public, I had to copy them in my source code, which is an error prone duplication of code I would like to avoid.
I don't want to commit to making the internal representation of these tables public. If there are compelling use cases, maybe encoding_rs could provide an iterator over the known labels. However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.
Dumping the list as a matter of documentation as opposed to something that an application actually operates on is somewhat of a different case. I'm not particularly keen on doing that either, in order not to suggest non-preferred labels to users.
I'll think about this a bit.
Maybe just expose a constant with all implementing encodings?
pub static ALL_ENCODINGS: [&'static Encoding; 40] = [
BIG5,
EUC_JP,
EUC_KR,
GB18030,
GBK,
IBM866,
ISO_2022_JP,
ISO_8859_2,
ISO_8859_3,
ISO_8859_4,
ISO_8859_5,
ISO_8859_6,
ISO_8859_7,
ISO_8859_8,
ISO_8859_8_I,
ISO_8859_10,
ISO_8859_13,
ISO_8859_14,
ISO_8859_15,
ISO_8859_16,
KOI8_R,
KOI8_U,
MACINTOSH,
REPLACEMENT,
SHIFT_JIS,
UTF_8,
UTF_16BE,
UTF_16LE,
WINDOWS_874,
WINDOWS_1250,
WINDOWS_1251,
WINDOWS_1252,
WINDOWS_1253,
WINDOWS_1254,
WINDOWS_1255,
WINDOWS_1256,
WINDOWS_1257,
WINDOWS_1258,
X_MAC_CYRILLIC,
X_USER_DEFINED,
];
However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.
A case: For legacy zip files, they use OEM encoding without any charset information stored. To handle such files, we need to let users choose an encoding, then we need to get all labels to show a pop-up list.