yaccety_sax icon indicating copy to clipboard operation
yaccety_sax copied to clipboard

Add Extended-ASCII transcoders for 8-bit scripts

Open zadean opened this issue 2 years ago • 0 comments

Even though this project only "officially" parses UTF-8, ys_utils:trancoding_file_continuation/2 can check for different encodings and react on them by converting the input stream to UTF-8. It does so by only looking at the byte-order-mark, or the encoding of the first 4 bytes.

Extended ASCII encodings may come in with no BOM, and only be recognized by the encoding value of the startDocument event.

To be able to handle these scripts, there should be a transcoding wrapper function available that will wrap the continuation function and convert the stream to UTF-8.

Possible list of encodings that could be handled:

  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-11
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • WINDOWS-1250
  • WINDOWS-1251
  • WINDOWS-1252
  • WINDOWS-1253
  • WINDOWS-1254
  • WINDOWS-1255
  • WINDOWS-1256
  • WINDOWS-1257
  • WINDOWS-1258

zadean avatar Mar 12 '22 18:03 zadean