pdf2docx
pdf2docx copied to clipboard
Open source Python library for converting PDF to DOCX.
段落划分有点问题
hi, 感谢作者有这么好的库!!! 最近在使用的时候,有个文件期望段落能够这么分出来  但是好像分的有点问题,第二个段落,由于单词之间的间距变大了,每个单词都被划分为段落  原始文件如下,改文件的第1页 [1.pdf](https://github.com/dothinking/pdf2docx/files/9313570/1.pdf)
Hello ! I have a summary on a PDF that looks like this:  Problem, when I convert the PDF to docx, I get :  The PDF file :...
环境:python 3.7,pip 22.1.2,pdf2docx 0.5.4,PyMuPDF 1.20.0,python-docx 0.8.11 步骤代码:  报错情况: 
index error: list index out of range File "D:\Anaconda\lib\site-packages\pdf2docx\text\Textspan.py", line 130 self.chars[0].origin, # the bottom left point of the first character ps: when I modified the code, the program stopped...
 问题如图所示。 文件链接: 246KB,链接:https://pan.baidu.com/s/1zYVu1UrAc2CyVpd6eT_LDg 提取码:i481
[Test pdf](https://github.com/dothinking/pdf2docx/files/8559585/5CE48DAAB7DB616A.pdf) [docx](https://github.com/dothinking/pdf2docx/files/8559613/5.docx) convert log ``` [INFO] Start to convert g://pdf/5CE48DAAB7DB616A.pdf [INFO] [1/4] Opening document... [INFO] [2/4] Analyzing document... [INFO] [3/4] Parsing pages... [INFO] (1/29) Page 1 [INFO] (2/29) Page...
Some documents can't be processed page by page due to an index error. As a result pages are blank. This small fix handles the exception are pages are being extracted...
Hey Author, It support the hebrew and arabic letters but it write it in Inverted letters where the code do the convert and get the letter? can you give me...
[1804.10371.pdf](https://github.com/dothinking/pdf2docx/files/8858068/1804.10371.pdf) [1804.10371.docx](https://github.com/dothinking/pdf2docx/files/8858069/1804.10371.docx)