html2text
html2text copied to clipboard
Long table rows cause incorrect table conversions
- Version by
html2text.__version__: (2016, 9, 19) - Test script
import html2text
conv = html2text.HTML2Text()
conv.pad_tables = True
print conv.handle('<table><tr><td>1</td><td>2</td></tr><tr><td>juju</td><td><a href="http://example.com/there/is/some/very/long/path">huhu</a></td></tr></table>')
# 1 | 2
# -----|---------------------------------------------------------
# juju | [huhu](http://example.com/there/is/some/very/long/path)
conv.body_width = 40
print conv.handle('<table><tr><td>1</td><td>2</td></tr><tr><td>juju</td><td><a href="http://example.com/there/is/some/very/long/path">huhu</a></td></tr></table>')
# 1
# --------------------------------------------------------
# juju
# [huhu](http://example.com/there/is/some/very/long/path)
- Python version
python --version: 2.7.13
I have this problem also, it looks like html2text does line wrapping at 78 or 80 characters that messes up the markdown table layout. Also has problem when a row spans multiple columns. E.g.:
Input:
<table>
<tr class="lightgrey">
<td class="td_row"><b>Row#</td>
<td class="td_verification"><b>Web Service</td>
<td class="td_endpoint"><b>Description</td>
<td class="td_call_status"><b>Call Status</td>
<td class="td_verification"><b>Passed Verification</td>
<td class="td_verification"><b>Failed Verification</td></tr>
<tr class="comment">
<td colspan="6"><H1>Q0_WS01 RedemptionServices</H1></td></tr>
<tr class="comment">
<td colspan="6"><H2>0. SetUp</H2></td></tr>
<tr class="lightgreen">
<td class="td_row_num">5</td>
<td class="td_nowrap">createAPMember1.0 <a href="RedemptionServices.xls-Z5H-20171207-123235-Row-5-Request.xml">Request</a> <a href="RedemptionServices.xls-Z5H-20171207-123235-Row-5-Response.xml">Response</a></td>
<td>Setup step: Create test AirPoints member</td>
<td>PASS</td>
<td>6</td>
<td>0</td></tr>
Result: The table layout is broken up and long rows have been wrapped with hard line breaks at column 78
**Row# | **Web Service | **Description | **Call Status | **Passed Verification
| **Failed Verification
---|---|---|---|---|---
# Q0_WS01 RedemptionServices
## 0\. SetUp
5 | createAPMember1.0
[Request](RedemptionServices.xls-Z5H-20171207-123235-Row-5-Request.xml)
[Response](RedemptionServices.xls-Z5H-20171207-123235-Row-5-Response.xml) |
Setup step: Create test AirPoints member | PASS | 6 | 0
Fix:
- don't pad tables
- do use the following options (or their equvalent methods)
$ html2text -b0 --no-wrap-links AnnoyingTabulatedReport.html > NicerReport.md
(rows that span multiple columns are still not handled but I suppose that's a separate issue)