html2text icon indicating copy to clipboard operation
html2text copied to clipboard

Add back support for "-utf8" flag for backwards compatibility

Open thp opened this issue 3 years ago • 0 comments

We historically used -utf8 with older versions of html2text, but the new version defaulted to UTF-8 by default, and does not accept -utf8 as command-line argument anymore.

https://github.com/thp/urlwatch/issues/718

Version 2.1.1 help output:

% html2text -help
This is html2text, version 2.1.1

Usage:
  html2text -help
  html2text -version
  html2text [ -check ] [ -debug-scanner ] [ -debug-parser ] \
     [ -rcfile <file> ] [ -width <w> ] [ -nobs ] [ -links ]\
     [ -from_encoding ] [ -to_encoding ] [ -ascii ]\
     [ -o <file> ] [ <input-file> ] ...
Formats HTML document(s) read from <input-file> or STDIN and generates ASCII
text.
  -help          Print this text and exit
  -version       Print program version and copyright notice
  -check         Do syntax checking only
  -debug-scanner Report parsed tokens on STDERR (debugging)
  -debug-parser  Report parser activity on STDERR (debugging)
  -rcfile <file> Read <file> instead of "$HOME/.html2textrc"
  -width <w>     Optimize for screen widths other than 79
  -nobs          Do not render boldface and underlining (using backspaces)
  -links         Generate reference list with link targets
  -from_encoding Treat input encoded as given encoding
  -to_encoding   Output using given encoding
  -ascii         Use plain ASCII for output instead of UTF-8
                 alias for: -to_encoding ASCII//TRANSLIT 
  -o <file>      Redirect output into <file>

Old version help:

$ html2text -help
This is html2text, version 1.3.2a

Usage:
  html2text -help
  html2text -version
  html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] \
     [ -rcfile <file> ] [ -style ( compact | pretty ) ] [ -width <w> ] \
     [ -o <file> ] [ -nobs ] [ -ascii | -utf8 ] [ <input-url> ] ...
Formats HTML document(s) read from <input-url> or STDIN and generates ASCII
text.
  -help          Print this text and exit
  -version       Print program version and copyright notice
  -unparse       Generate HTML instead of ASCII output
  -check         Do syntax checking only
  -debug-scanner Report parsed tokens on STDERR (debugging)
  -debug-parser  Report parser activity on STDERR (debugging)
  -rcfile <file> Read <file> instead of "$HOME/.html2textrc"
  -style compact Create a "compact" output format (default)
  -style pretty  Insert some vertical space for nicer output
  -width <w>     Optimize for screen widths other than 79
  -o <file>      Redirect output into <file>
  -nobs          Do not use backspaces for boldface and underlining
  -ascii         Use plain ASCII for output instead of ISO-8859-1
  -utf8          Assume both terminal and input stream are in UTF-8 mode
  -nometa        Don't try to recode input using 'meta' tag

It might have been nice to keep supporting -utf8 (maybe even unlisted in the -help output) as a no-op (as the default is UTF-8) so that existing scripts using html2text can work with both versions.

For now, I worked around this by first feature-checking -utf8 via -help's output and then either adding it or leaving it out.

thp avatar Aug 29 '22 08:08 thp