pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

After converting a PDF file to Word, some images in the Word document are rotated

Open yyyang91 opened this issue 6 months ago • 1 comments

Description of the bug

This is the PDF file that triggered the issue. 彼得与狼.pdf.

The display effect in Adobe Reader:

Image

The display effect in Office Word:

Image

How to reproduce the bug

The following code will convert the PDF file to a Word file.

from pdf2docx import Converter

def pdf_to_word(pdf_path, word_path):
    cv = Converter(pdf_path)
    cv.convert(word_path)
    cv.close()
    print(f"success")


if __name__ == "__main__":
    pdf_to_word("彼得与狼.pdf","彼得与狼.docx")

I found that in the function ImagesExtractor.extract_images, for non-intersected normal images, only the rotation of the page seems to be considered, not the rotation of the image itself.
[ImagesExtractor.py](https://github.com/ArtifexSoftware/pdf2docx/blob/master/pdf2docx/image/ImagesExtractor.py)

### pdf2docx version

0.5.8

### Operating system

Windows

### Python version

3.8

yyyang91 avatar May 13 '25 03:05 yyyang91