-ascii and transliteration
The "-ascii" parameter is not performing the transiteration. Here is an example:
echo atenção | html2text -ascii
aten????o
Hmmm
% echo atenção | html2text -from_encoding UTF-8 -ascii
atenc~ao
not sure if that's so expected, but is your env using latin-1 as encoding (opposed to UTF-8)?
It's really confusing indeed:
% echo atenção | html2text -from_encoding UTF-8 -to_encoding ascii
aten????o
% echo atenção | html2text -from_encoding UTF-8 -to_encoding ascii//translit
atenc~ao
so on my system (macOS) it seems like -ascii is doing the translit as advertised.
You're right. I forgot that the standard input is ISO-8859-1.
Here are the tests and their results again:
$ echo atenção | html2text -from_encoding UTF-8 -ascii
aten??o
$ echo atenção | html2text -from_encoding UTF-8 -to_encoding ascii
aten????o
$ echo atenção | html2text -from_encoding UTF-8 -to_encoding ascii//translit
aten??o
Explaining better, see the result of iconv when using ascii//translit:
$ echo atenção | iconv -f UTF-8 -t ascii//translit
atencao
In my view, this is the expected result.
-- System Information: Debian Release: bullseye/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64)
Kernel: Linux 4.19.0-12-amd64 (SMP w/4 CPU threads) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to C.UTF-8), LANGUAGE=C.UTF-8 Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect
Versions of packages html2text depends on: ii libc6 2.31-4 ii libgcc-s1 10.2.0-15 ii libstdc++6 10.2.0-15
Are you sure html2text is linked against the same libiconv as the iconv utility you use? I guess you use glibc, so it should be using the same libc. Can you try this with 2.0.0 or latest git?
need info
In version 2.2.3 the behavior persists.
:(