pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

Open source Python library for converting PDF to DOCX.

Results 122 pdf2docx issues
Sort by recently updated
recently updated
newest added

hi, 感谢作者有这么好的库!!! 最近在使用的时候,有个文件期望段落能够这么分出来 ![image](https://user-images.githubusercontent.com/1823364/184281868-91a26a63-acb8-48c9-a008-c3532852e6b0.png) 但是好像分的有点问题,第二个段落,由于单词之间的间距变大了,每个单词都被划分为段落 ![image](https://user-images.githubusercontent.com/1823364/184281970-12b825af-650e-4410-9acd-5192be287a01.png) 原始文件如下,改文件的第1页 [1.pdf](https://github.com/dothinking/pdf2docx/files/9313570/1.pdf)

bug
enhancement

Hello ! I have a summary on a PDF that looks like this: ![image](https://user-images.githubusercontent.com/67562521/182177455-ed31a3e2-35dc-4a85-8427-483a5df8dce7.png) Problem, when I convert the PDF to docx, I get : ![image](https://user-images.githubusercontent.com/67562521/182177795-0c14087f-3f12-4d6b-8692-14bbb7285159.png) The PDF file :...

环境:python 3.7,pip 22.1.2,pdf2docx 0.5.4,PyMuPDF 1.20.0,python-docx 0.8.11 步骤代码: ![image](https://user-images.githubusercontent.com/87413355/175495224-6c612050-9486-4af5-b5b0-922dd136eb54.png) 报错情况: ![image](https://user-images.githubusercontent.com/87413355/175495789-2d82cd3b-ab65-4356-9472-7c75a638b14f.png)

bug
upstream

index error: list index out of range File "D:\Anaconda\lib\site-packages\pdf2docx\text\Textspan.py", line 130 self.chars[0].origin, # the bottom left point of the first character ps: when I modified the code, the program stopped...

input needed

转换后的汉字,经常会出现很多比如 康熙部首 等unicode的字符

input needed

![image](https://user-images.githubusercontent.com/30330903/139228227-64a52e2d-f874-43fb-850e-6560a43a904e.png) 问题如图所示。 文件链接: 246KB,链接:https://pan.baidu.com/s/1zYVu1UrAc2CyVpd6eT_LDg 提取码:i481

bug
information required

[Test pdf](https://github.com/dothinking/pdf2docx/files/8559585/5CE48DAAB7DB616A.pdf) [docx](https://github.com/dothinking/pdf2docx/files/8559613/5.docx) convert log ``` [INFO] Start to convert g://pdf/5CE48DAAB7DB616A.pdf [INFO] [1/4] Opening document... [INFO] [2/4] Analyzing document... [INFO] [3/4] Parsing pages... [INFO] (1/29) Page 1 [INFO] (2/29) Page...

bug
upstream

Some documents can't be processed page by page due to an index error. As a result pages are blank. This small fix handles the exception are pages are being extracted...

Hey Author, It support the hebrew and arabic letters but it write it in Inverted letters where the code do the convert and get the letter? can you give me...

bug

[1804.10371.pdf](https://github.com/dothinking/pdf2docx/files/8858068/1804.10371.pdf) [1804.10371.docx](https://github.com/dothinking/pdf2docx/files/8858069/1804.10371.docx)

enhancement
question