pdf2docx
pdf2docx copied to clipboard
Open source Python library for converting PDF to DOCX.
您好,我在运行PDF转docx文件时,提示 mupdf: expected object number [INFO] Start to convert 581529124_40000_5783.pdf [INFO] [1/4] Opening document... [INFO] [2/4] Analyzing document... [WARNING] Ignore hidden text checking due to UnicodeDecodeError in upstream library. [WARNING]...
Original source: [pdf2docx-test-doc.tex](https://people.freebsd.org/~yuri/pdf2docx-test-doc.tex) PDF: [pdf2docx-test-doc.pdf](https://people.freebsd.org/~yuri/pdf2docx-test-doc.pdf) DOCX: [pdf2docx-test-doc.docx](https://people.freebsd.org/~yuri/pdf2docx-test-doc.docx) Problems in DOCX: * The text is shown in 2 columns instead of 1 column on page#3 * Page#4 is left empty for...
能否提供一些思路,如果我有能力我会提PR
Hi there! Thanks a lot, your tool is really awesome! There is one question: I am faced with the fact that not all but some of the paragraphs for some...
源pdf内容  解析后的内容  源pdf内容  解析后的内容  [972fa50687484dd6.pdf](https://github.com/dothinking/pdf2docx/files/11218727/972fa50687484dd6.pdf)
首先 感谢大佬提供这么好的工具! 在使用extract table方法的时候 提出的数据有遗漏 如图  测试文件: [test1.pdf](https://github.com/dothinking/pdf2docx/files/11493902/test1.pdf)
Hello. After application of function `extract_tables` in some lists I get values ``. Is it possible to extract data from ``? If necessary, I can give an example pdf file,...
[WARNING] Ignore Line "" due to overlap 提示警告信息后,没有进度了。
Hi All, I have a PDF file that has lines with breaks/whitespaces inbetween. following lines are underlined and again followed by line breaks. while converting to DOCX, we see that...
感谢大佬提供这么好的工具! 使用中发现了一个问题: 解析表格时,会丢失部分线条,原始 PDF 文件、转换后的 docx 文件、丢失内容(已用红色框标出)如附件所示  [page3.docx](https://github.com/dothinking/pdf2docx/files/11027018/page3.docx) [page3.pdf](https://github.com/dothinking/pdf2docx/files/11027020/page3.pdf)