html2text
html2text copied to clipboard
processing of <pre> element results in double-spaced text
When running this example script:
#!/usr/bin/python
import html2text
inStr = """
<pre class="wiki">"addnoresponse": {
"name": "NoRespns",
"position": "topRow",
"commentIdx": "noResponseString",
"status": "CLOSED",
"resolution": "INSUFFICIENT_DATA"
},
</pre>
"""
print html2text.html2text(inStr)
I get this:
bradford:~ $ python test-PRE-bug.py
"addnoresponse": {
"name": "NoRespns",
"position": "topRow",
"commentIdx": "noResponseString",
"status": "CLOSED",
"resolution": "INSUFFICIENT_DATA"
},
bradford:~ $
I mean this is pretty awful. I understand that you want to make this into Markdown, but shouldn’t html2text produce something at least a bit readable? Or could we get some parameter to html2text (prettyParse=true), which would avoid this?
I assume it's just a bug. That's not even the right Markdown.
Glad to hear it is not intentional. Thanks.
Looks like it's a bug in the line-wrapping. If you turn that off, it should work.
How should I do it? Lack of any reasonable documentation for html2text is another bug (or maybe I am stupid, and I just haven't found it).
@mcepl you can turn off line wrapping like this: https://github.com/fmarier/blogger2ikiwiki/commit/d352a4655185640642fd8550bcd6c9740f915540
You can turn it off by setting body_width to zero, e.g. by -b 0.
This seems to be a little better in the latest version, but still confused by the first line.