html2docx icon indicating copy to clipboard operation
html2docx copied to clipboard

margin handler cannot handle percentages

Open roablep opened this issue 1 year ago • 0 comments

Looks like add_styles_to_paragraph it's not handling styles expressed as percentages because the regex is looking for chars like px.

Sample HTML:

<P STYLE="margin-top:6px;margin-bottom:0px; margin-left:4%; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">test text</FONT></P>

Result:

ValueError: could not convert string to float: '4%'

Full traceback:

File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:609, in HtmlToDocx.parse_html_file(self, filename_html, filename_docx)
    607     html = infile.read()
    608 self.set_initial_attrs()
--> 609 self.run_process(html)
    610 if not filename_docx:
    611     path, filename = os.path.split(filename_html)

File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:583, in HtmlToDocx.run_process(self, html)
    581 if self.include_tables:
    582     self.get_tables()
--> 583 self.feed(html)

File ~/opt/anaconda3/lib/python3.9/html/parser.py:110, in HTMLParser.feed(self, data)
    104 r"""Feed data to the parser.
    105
    106 Call this as often as you want, with as little or as much text
    107 as you want (may include '\n').
    108 """
    109 self.rawdata = self.rawdata + data
--> 110 self.goahead(0)

File ~/opt/anaconda3/lib/python3.9/html/parser.py:170, in HTMLParser.goahead(self, end)
    168 if startswith('<', i):
    169     if starttagopen.match(rawdata, i): # < + letter
--> 170         k = self.parse_starttag(i)
    171     elif startswith("</", i):
    172         k = self.parse_endtag(i)

File ~/opt/anaconda3/lib/python3.9/html/parser.py:344, in HTMLParser.parse_starttag(self, i)
    342     self.handle_startendtag(tag, attrs)
    343 else:
--> 344     self.handle_starttag(tag, attrs)
    345     if tag in self.CDATA_CONTENT_ELEMENTS:
    346         self.set_cdata_mode(tag)

File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:465, in HtmlToDocx.handle_starttag(self, tag, attrs)
    463 if 'style' in current_attrs and self.paragraph:
    464     style = self.parse_dict_string(current_attrs['style'])
--> 465     self.add_styles_to_paragraph(style)

File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:218, in HtmlToDocx.add_styles_to_paragraph(self, style)
    216 margin = style['margin-left']
    217 units = re.sub(r'[0-9]+', '', margin)
--> 218 margin = int(float(re.sub(r'[a-z]+', '', margin)))
    219 if units == 'px':
    220     self.paragraph.paragraph_format.left_indent = Inches(min(margin // 10 * INDENT, MAX_INDENT))

roablep avatar Oct 06 '22 12:10 roablep