yaccety_sax
yaccety_sax copied to clipboard
Add Extended-ASCII transcoders for 8-bit scripts
Even though this project only "officially" parses UTF-8, ys_utils:trancoding_file_continuation/2
can check for different encodings and react on them by converting the input stream to UTF-8.
It does so by only looking at the byte-order-mark, or the encoding of the first 4 bytes.
Extended ASCII encodings may come in with no BOM, and only be recognized by the encoding
value of the startDocument
event.
To be able to handle these scripts, there should be a transcoding wrapper function available that will wrap the continuation function and convert the stream to UTF-8.
Possible list of encodings that could be handled:
- ISO-8859-1
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- ISO-8859-10
- ISO-8859-11
- ISO-8859-13
- ISO-8859-14
- ISO-8859-15
- ISO-8859-16
- WINDOWS-1250
- WINDOWS-1251
- WINDOWS-1252
- WINDOWS-1253
- WINDOWS-1254
- WINDOWS-1255
- WINDOWS-1256
- WINDOWS-1257
- WINDOWS-1258