html2text icon indicating copy to clipboard operation
html2text copied to clipboard

processing of <pre> element results in double-spaced text

Open mcepl opened this issue 14 years ago • 7 comments
trafficstars

When running this example script:

#!/usr/bin/python

import html2text

inStr = """
<pre class="wiki">"addnoresponse": {
    "name": "NoRespns",
    "position": "topRow",
    "commentIdx": "noResponseString",
    "status": "CLOSED",
    "resolution": "INSUFFICIENT_DATA"
},
</pre>
"""
print html2text.html2text(inStr)

I get this:

bradford:~ $ python test-PRE-bug.py 
"addnoresponse": {

        "name": "NoRespns",

        "position": "topRow",

        "commentIdx": "noResponseString",

        "status": "CLOSED",

        "resolution": "INSUFFICIENT_DATA"

    },



bradford:~ $ 

I mean this is pretty awful. I understand that you want to make this into Markdown, but shouldn’t html2text produce something at least a bit readable? Or could we get some parameter to html2text (prettyParse=true), which would avoid this?

mcepl avatar Mar 09 '11 22:03 mcepl

I assume it's just a bug. That's not even the right Markdown.

aaronsw avatar Mar 10 '11 16:03 aaronsw

Glad to hear it is not intentional. Thanks.

mcepl avatar Mar 10 '11 16:03 mcepl

Looks like it's a bug in the line-wrapping. If you turn that off, it should work.

aaronsw avatar Mar 10 '11 18:03 aaronsw

How should I do it? Lack of any reasonable documentation for html2text is another bug (or maybe I am stupid, and I just haven't found it).

mcepl avatar Mar 10 '11 21:03 mcepl

@mcepl you can turn off line wrapping like this: https://github.com/fmarier/blogger2ikiwiki/commit/d352a4655185640642fd8550bcd6c9740f915540

fmarier avatar Jul 22 '12 11:07 fmarier

You can turn it off by setting body_width to zero, e.g. by -b 0.

aaronsw avatar Sep 20 '12 15:09 aaronsw

This seems to be a little better in the latest version, but still confused by the first line.

aaronsw avatar Sep 20 '12 15:09 aaronsw