pydocx
pydocx copied to clipboard
Hyperlink imports as strong tag instead of anchor tag
I'm unsure of the exact cause, but attached .docx
has a hyperlink around the text http://translate.google.com/#
, but when run through pydocx, the resulting HTML is just surrounded in strong
tags instead of an anchor tag.
Interestingly, if I open the file in Open Office and save it again, the internal structure changes and running the resulting file through pydocx results in correct behavior. hyperlink_did_not_translate.docx
Ugh. This is because the instrText
is spread out over several nodes. I made the assumption that this would not happen, because it's silly:
<w:r>
<w:instrText xml:space="preserve"> HYPERLINK "</w:instrText>
</w:r>
<w:r w:rsidRPr="00710528">
<w:instrText>http://translate.google.com/#</w:instrText>
</w:r>
<w:r>
<w:instrText xml:space="preserve">" </w:instrText>
</w:r>
PyDocX only handles the instrText
HYPERLINK
if it is formatted like this:
<w:r>
<w:instrText xml:space="preserve"> HYPERLINK "http://translate.google.com/#"</w:instrText>
</w:r>
I suspect it's happening because of the #
in the URL. Maybe word sees this as a special character.