pdf2docx
pdf2docx copied to clipboard
Conversion very irregular and out of format
trafficstars
Please see the attached .pdf file and the resulting .docx file. The format becomes split and very weird. generated.docx out.pdf
@HassanRaza1313 I also faced this issue. Have you resolved this issue using any module/api in python?
@dothinking Can you please comment on this issue?
I worked around this issue by adding something like the following after conversion. It seemed like the paragraph space before / space after was the culprit.
# Adjust Paragraph Space Before / Space After
for paragraph in document.paragraphs:
paragraph.paragraph_format.line_spacing_rule = WD_LINE_SPACING.SINGLE
space_before = paragraph.paragraph_format.space_before
if space_before and space_before.pt > 12:
paragraph.paragraph_format.space_before = Pt(12)
space_after = paragraph.paragraph_format.space_after
if space_after and space_after.pt > 12:
paragraph.paragraph_format.space_before = Pt(12)