html2text
html2text copied to clipboard
Add back support for "-utf8" flag for backwards compatibility
We historically used -utf8 with older versions of html2text, but the new version defaulted to UTF-8 by default, and does not accept -utf8 as command-line argument anymore.
https://github.com/thp/urlwatch/issues/718
Version 2.1.1 help output:
% html2text -help
This is html2text, version 2.1.1
Usage:
html2text -help
html2text -version
html2text [ -check ] [ -debug-scanner ] [ -debug-parser ] \
[ -rcfile <file> ] [ -width <w> ] [ -nobs ] [ -links ]\
[ -from_encoding ] [ -to_encoding ] [ -ascii ]\
[ -o <file> ] [ <input-file> ] ...
Formats HTML document(s) read from <input-file> or STDIN and generates ASCII
text.
-help Print this text and exit
-version Print program version and copyright notice
-check Do syntax checking only
-debug-scanner Report parsed tokens on STDERR (debugging)
-debug-parser Report parser activity on STDERR (debugging)
-rcfile <file> Read <file> instead of "$HOME/.html2textrc"
-width <w> Optimize for screen widths other than 79
-nobs Do not render boldface and underlining (using backspaces)
-links Generate reference list with link targets
-from_encoding Treat input encoded as given encoding
-to_encoding Output using given encoding
-ascii Use plain ASCII for output instead of UTF-8
alias for: -to_encoding ASCII//TRANSLIT
-o <file> Redirect output into <file>
Old version help:
$ html2text -help
This is html2text, version 1.3.2a
Usage:
html2text -help
html2text -version
html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] \
[ -rcfile <file> ] [ -style ( compact | pretty ) ] [ -width <w> ] \
[ -o <file> ] [ -nobs ] [ -ascii | -utf8 ] [ <input-url> ] ...
Formats HTML document(s) read from <input-url> or STDIN and generates ASCII
text.
-help Print this text and exit
-version Print program version and copyright notice
-unparse Generate HTML instead of ASCII output
-check Do syntax checking only
-debug-scanner Report parsed tokens on STDERR (debugging)
-debug-parser Report parser activity on STDERR (debugging)
-rcfile <file> Read <file> instead of "$HOME/.html2textrc"
-style compact Create a "compact" output format (default)
-style pretty Insert some vertical space for nicer output
-width <w> Optimize for screen widths other than 79
-o <file> Redirect output into <file>
-nobs Do not use backspaces for boldface and underlining
-ascii Use plain ASCII for output instead of ISO-8859-1
-utf8 Assume both terminal and input stream are in UTF-8 mode
-nometa Don't try to recode input using 'meta' tag
It might have been nice to keep supporting -utf8 (maybe even unlisted in the -help output) as a no-op (as the default is UTF-8) so that existing scripts using html2text can work with both versions.
For now, I worked around this by first feature-checking -utf8 via -help's output and then either adding it or leaving it out.