pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

Open source Python library for converting PDF to DOCX.

Results 122 pdf2docx issues
Sort by recently updated
recently updated
newest added

大佬好 看起来_hide_page_text是想在分割pdf版面前隐藏所有文字,但是它只能隐藏图片中的文字,我不清楚这是否是出于设计,这会有什么可能的问题吗? ![image](https://github.com/dothinking/pdf2docx/assets/37922390/5679b6cb-01c7-4861-9d9c-52da9255b4e4)

question

一个夸页的大表格转为.docx后被分成了不同页上多个表格,希望转成docx后还是一个完整表格,这个怎么解决?谢谢!

feature

pdf转word时候,原pdf中目录页虚线丢了,点击跳转也丢了 原pdf样式 ![image](https://github.com/dothinking/pdf2docx/assets/2055618/93272caa-ac6c-4b7e-bbca-7bfde55b68a8) 转后的word样式 ![image](https://github.com/dothinking/pdf2docx/assets/2055618/814d3688-e283-44ed-af97-a3fe6996de31)

feature

Hello, thank you for using these nice projects but I got an issue. When I convert pdf to docx, It can not show me lines which are drawn by border-bottom...

input needed

Hi, I would like to develop it for RTL languages. Is it possible? Can you tell me from where I should start?

enhancement
question

Thanks for making this library, it's a life saver. I am getting this error message when attempting to use the Multi-Processing flag for the Python runtime: `An exception occurred: [Errno...

这个开发包的转换质量不错,期待着更新新功能,可以识别页眉、页脚、段落。

feature

Hello, I receive this error when I try to download pdf2docx using pip to pdf2docx. Any ideas? "AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests'"

information required

我之前实习时做了pdf转txt的工作,其中pdf转word使用的该库(pdf2docx),然后word转txt是手写的。也在很大程度上实现了去除页眉页脚,但仅仅能满足于输出端是txt(不提取多列的表格)。在我实习期间处理了500w+本的pdf转txt,并在公司内部上线了部署服务。我走后接手这个工作的实习生又进行了优化,具体改进我没问。 我想看看大家对这个需求大不大,我可以选择新建一个开源库或者在pdf2docx提一个pr。希望有需求的可以在下方留言

question

我处理的一个的pdf文件中在67页: ![image](https://github.com/dothinking/pdf2docx/assets/83513889/cb846fbe-542c-41cc-b61c-cb1f194c6e17) RuntimeError: pixmap must be grayscale or rgb to write as png,这个图片既不是灰度图片也不是一个rgb。代码和报错如下: ![image](https://github.com/dothinking/pdf2docx/assets/83513889/94fbc6f2-d30c-45fe-879c-cd8e351fd552) 我希望的是能修改这个错误或者抛出这个异常,从而不会导致这个程序崩溃 ![image](https://github.com/dothinking/pdf2docx/assets/83513889/92a8638f-fcf7-443b-8e58-5da0ed65fe8f)