pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

Open source Python library for converting PDF to DOCX.

Results 122 pdf2docx issues
Sort by recently updated
recently updated
newest added

您好,我在运行PDF转docx文件时,提示 mupdf: expected object number [INFO] Start to convert 581529124_40000_5783.pdf [INFO] [1/4] Opening document... [INFO] [2/4] Analyzing document... [WARNING] Ignore hidden text checking due to UnicodeDecodeError in upstream library. [WARNING]...

bug

Original source: [pdf2docx-test-doc.tex](https://people.freebsd.org/~yuri/pdf2docx-test-doc.tex) PDF: [pdf2docx-test-doc.pdf](https://people.freebsd.org/~yuri/pdf2docx-test-doc.pdf) DOCX: [pdf2docx-test-doc.docx](https://people.freebsd.org/~yuri/pdf2docx-test-doc.docx) Problems in DOCX: * The text is shown in 2 columns instead of 1 column on page#3 * Page#4 is left empty for...

能否提供一些思路,如果我有能力我会提PR

Hi there! Thanks a lot, your tool is really awesome! There is one question: I am faced with the fact that not all but some of the paragraphs for some...

源pdf内容 ![image](https://user-images.githubusercontent.com/16497860/231669605-3dc1ee31-802b-4722-9a3e-e388865f3acb.png) 解析后的内容 ![image](https://user-images.githubusercontent.com/16497860/231669660-a6e33b3d-3ec0-462f-b434-d213e446ef77.png) 源pdf内容 ![image](https://user-images.githubusercontent.com/16497860/231669736-0218dff5-90c3-479e-8a65-15380df0d07e.png) 解析后的内容 ![image](https://user-images.githubusercontent.com/16497860/231669817-15e57877-d1d9-41af-90a7-9255b20a0681.png) [972fa50687484dd6.pdf](https://github.com/dothinking/pdf2docx/files/11218727/972fa50687484dd6.pdf)

首先 感谢大佬提供这么好的工具! 在使用extract table方法的时候 提出的数据有遗漏 如图 ![image](https://github.com/dothinking/pdf2docx/assets/10828528/5fc9174c-69e7-443a-9fe5-64ce8cf024cc) 测试文件: [test1.pdf](https://github.com/dothinking/pdf2docx/files/11493902/test1.pdf)

Hello. After application of function `extract_tables` in some lists I get values ``. Is it possible to extract data from ``? If necessary, I can give an example pdf file,...

[WARNING] Ignore Line "" due to overlap 提示警告信息后,没有进度了。

information required

Hi All, I have a PDF file that has lines with breaks/whitespaces inbetween. following lines are underlined and again followed by line breaks. while converting to DOCX, we see that...

感谢大佬提供这么好的工具! 使用中发现了一个问题: 解析表格时,会丢失部分线条,原始 PDF 文件、转换后的 docx 文件、丢失内容(已用红色框标出)如附件所示 ![丢失内容](https://user-images.githubusercontent.com/68527951/226574670-ea15510e-2bab-4654-8c84-287ce7090097.png) [page3.docx](https://github.com/dothinking/pdf2docx/files/11027018/page3.docx) [page3.pdf](https://github.com/dothinking/pdf2docx/files/11027020/page3.pdf)