cChardet
cChardet copied to clipboard
LookupError: unknown encoding: EUC-TW
This seems similar in nature to https://github.com/PyYoshi/cChardet/issues/8, but unfortunately, I do not know what to recommend as an alternative to EUC-TW
.
One can see that there is nothing that quite matches in Python's list of standard encodings.
I also thought that I should look through the other encodings mentioned in the readme, and found that there are a number of other codecs that did not come up in the list:
- ~~
HZ-GB-2312
(although there are other aliases forgb2312
)~~ Oops, I see it in the list now. -
ISO-2022-CN
-
TIS-620
(appears to be the same asISO-8859-11
except for no-break space character at 0xA0, which is unassigned inTIS-620
) -
X-ISO-10646-UCS-4-2143
-
X-ISO-10646-UCS-4-3412
Do you have any recommendations for how I could decode strings that are detected as these types in python?
Further investigation has revealed that python wont fix EUC-TW
and ISO-2022-CN
encodings.
Just ran into this myself.
X-ISO-10646-UCS-4-2143
X-ISO-10646-UCS-4-3412
See https://stackoverflow.com/q/18518730