pdf2docx
pdf2docx copied to clipboard
After converting a PDF file to Word, some images in the Word document are rotated
Description of the bug
This is the PDF file that triggered the issue. 彼得与狼.pdf.
The display effect in Adobe Reader:
The display effect in Office Word:
How to reproduce the bug
The following code will convert the PDF file to a Word file.
from pdf2docx import Converter
def pdf_to_word(pdf_path, word_path):
cv = Converter(pdf_path)
cv.convert(word_path)
cv.close()
print(f"success")
if __name__ == "__main__":
pdf_to_word("彼得与狼.pdf","彼得与狼.docx")
I found that in the function ImagesExtractor.extract_images, for non-intersected normal images, only the rotation of the page seems to be considered, not the rotation of the image itself.
[ImagesExtractor.py](https://github.com/ArtifexSoftware/pdf2docx/blob/master/pdf2docx/image/ImagesExtractor.py)
### pdf2docx version
0.5.8
### Operating system
Windows
### Python version
3.8