encoding_rs icon indicating copy to clipboard operation
encoding_rs copied to clipboard

Make encoding lists public

Open getreu opened this issue 5 years ago • 3 comments

I recently migrated Stringsext, a GNU Strings Alternative with Multi-Byte-Encoding Support from rust-encoding to encoding_rs.

The Stringsext tool prints la list of supported encoding names. As the lists in encoding_rs are not public, I had to copy them in my source code, which is an error prone duplication of code I would like to avoid.

getreu avatar Jan 13 '20 13:01 getreu

I don't want to commit to making the internal representation of these tables public. If there are compelling use cases, maybe encoding_rs could provide an iterator over the known labels. However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.

Dumping the list as a matter of documentation as opposed to something that an application actually operates on is somewhat of a different case. I'm not particularly keen on doing that either, in order not to suggest non-preferred labels to users.

I'll think about this a bit.

hsivonen avatar Jan 14 '20 07:01 hsivonen

Maybe just expose a constant with all implementing encodings?

pub static ALL_ENCODINGS: [&'static Encoding; 40] = [
    BIG5,
    EUC_JP,
    EUC_KR,
    GB18030,
    GBK,
    IBM866,
    ISO_2022_JP,
    ISO_8859_2,
    ISO_8859_3,
    ISO_8859_4,
    ISO_8859_5,
    ISO_8859_6,
    ISO_8859_7,
    ISO_8859_8,
    ISO_8859_8_I,
    ISO_8859_10,
    ISO_8859_13,
    ISO_8859_14,
    ISO_8859_15,
    ISO_8859_16,
    KOI8_R,
    KOI8_U,
    MACINTOSH,
    REPLACEMENT,
    SHIFT_JIS,
    UTF_8,
    UTF_16BE,
    UTF_16LE,
    WINDOWS_874,
    WINDOWS_1250,
    WINDOWS_1251,
    WINDOWS_1252,
    WINDOWS_1253,
    WINDOWS_1254,
    WINDOWS_1255,
    WINDOWS_1256,
    WINDOWS_1257,
    WINDOWS_1258,
    X_MAC_CYRILLIC,
    X_USER_DEFINED,
];

Mingun avatar Aug 20 '22 12:08 Mingun

However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.

A case: For legacy zip files, they use OEM encoding without any charset information stored. To handle such files, we need to let users choose an encoding, then we need to get all labels to show a pop-up list.

ArcticLampyrid avatar Aug 11 '23 17:08 ArcticLampyrid