perl6-lwp-simple icon indicating copy to clipboard operation
perl6-lwp-simple copied to clipboard

Allow for moar control over character encoding.

Open Rotwang opened this issue 10 years ago • 1 comments

As of now rakudo supports limited set of encodings. However some websites still use encodings like:

$ curl -s -I http://www.google.pl/ | grep charset=
Content-Type: text/html; charset=ISO-8859-2

excerpt from https://github.com/rakudo/rakudo/blob/nom/src/core/Rakudo/Internals.pm:

my $encodings := nqp::hash(
  # fast mapping for identicals
  'utf8',            'utf8',
  'utf16',           'utf16',
  'utf32',           'utf32',
  'ascii',           'ascii',
  'iso-8859-1',      'iso-8859-1',
  'windows-1252',    'windows-1252',
  # with dash
  'utf-8',           'utf8',
  'utf-16',          'utf16',
  'utf-32',          'utf32',
  # according to http://de.wikipedia.org/wiki/ISO-8859-1
  'iso_8859-1:1987', 'iso-8859-1',
  'iso_8859-1',      'iso-8859-1',
  'iso-ir-100',      'iso-8859-1',
  'latin1',          'iso-8859-1',
  'latin-1',         'iso-8859-1',
  'csisolatin1',     'iso-8859-1',
  'l1',              'iso-8859-1',
  'ibm819',          'iso-8859-1',
  'cp819',           'iso-8859-1',
);

It may be useful to:

  • tell LWP::Simple not to tamper with encoding (e.g. if you want to pipe the output to other process or print response body to a terminal)
  • force encoding (if you want to further process the response body as a string in your p6 script), so for example in case if the encoding type isn't set and you do know which one it is.

Rotwang avatar Dec 27 '15 23:12 Rotwang

It looks like the intent of this was added in the eb98a2c1

jonathanstowe avatar Feb 20 '21 10:02 jonathanstowe