pdf2docx issues

_hide_page_text这个函数的并没有隐藏全部的文字

2

大佬好看起来_hide_page_text是想在分割pdf版面前隐藏所有文字，但是它只能隐藏图片中的文字，我不清楚这是否是出于设计，这会有什么可能的问题吗？ ![image](https://github.com/dothinking/pdf2docx/assets/37922390/5679b6cb-01c7-4861-9d9c-52da9255b4e4)

alexw994

question

一个跨页的大表格被分成了多个表格

3

一个夸页的大表格转为.docx后被分成了不同页上多个表格，希望转成docx后还是一个完整表格，这个怎么解决？谢谢!

lsx66

feature

pdf转word时候，原pdf中目录页虚线丢了，点击跳转也丢了

1

pdf转word时候，原pdf中目录页虚线丢了，点击跳转也丢了原pdf样式 ![image](https://github.com/dothinking/pdf2docx/assets/2055618/93272caa-ac6c-4b7e-bbca-7bfde55b68a8) 转后的word样式 ![image](https://github.com/dothinking/pdf2docx/assets/2055618/814d3688-e283-44ed-af97-a3fe6996de31)

jackey35

feature

It can't parse border lines.

1

Hello, thank you for using these nice projects but I got an issue. When I convert pdf to docx, It can not show me lines which are drawn by border-bottom...

jjikkobro

input needed

I want to develop it for RTL languages

1

Hi, I would like to develop it for RTL languages. Is it possible? Can you tell me from where I should start?

typeoo

enhancement

question

Lambda Runtime - Function not implemented

1

Thanks for making this library, it's a life saver. I am getting this error message when attempting to use the Multi-Processing flag for the Python runtime: `An exception occurred: [Errno...

robin-garnham

什么时候可以识别页眉页脚？

2

这个开发包的转换质量不错，期待着更新新功能，可以识别页眉、页脚、段落。

yufuping

feature

Download Error no attribute 'convert_2to3_doctests'

1

Hello, I receive this error when I try to download pdf2docx using pip to pdf2docx. Any ideas? "AttributeError: 'Distribution' object has no attribute 'convert_2to3_doctests'"

walzjc

information required

去除页眉页脚的工作

4

我之前实习时做了pdf转txt的工作，其中pdf转word使用的该库（pdf2docx），然后word转txt是手写的。也在很大程度上实现了去除页眉页脚，但仅仅能满足于输出端是txt（不提取多列的表格）。在我实习期间处理了500w+本的pdf转txt，并在公司内部上线了部署服务。我走后接手这个工作的实习生又进行了优化，具体改进我没问。我想看看大家对这个需求大不大，我可以选择新建一个开源库或者在pdf2docx提一个pr。希望有需求的可以在下方留言

lbboier

question

遇到无法处理的图片会导致整个程序崩溃

7

我处理的一个的pdf文件中在67页： ![image](https://github.com/dothinking/pdf2docx/assets/83513889/cb846fbe-542c-41cc-b61c-cb1f194c6e17) RuntimeError: pixmap must be grayscale or rgb to write as png，这个图片既不是灰度图片也不是一个rgb。代码和报错如下： ![image](https://github.com/dothinking/pdf2docx/assets/83513889/94fbc6f2-d30c-45fe-879c-cd8e351fd552) 我希望的是能修改这个错误或者抛出这个异常，从而不会导致这个程序崩溃 ![image](https://github.com/dothinking/pdf2docx/assets/83513889/92a8638f-fcf7-443b-8e58-5da0ed65fe8f)

lbboier

pdf2docx
pdf2docx copied to clipboard

Metadata

_hide_page_text这个函数的并没有隐藏全部的文字

一个跨页的大表格被分成了多个表格

pdf转word时候，原pdf中目录页虚线丢了，点击跳转也丢了

It can't parse border lines.

I want to develop it for RTL languages

Lambda Runtime - Function not implemented

什么时候可以识别页眉页脚？

Download Error no attribute 'convert_2to3_doctests'

去除页眉页脚的工作

遇到无法处理的图片会导致整个程序崩溃

← Metadata

Owner

Metadata

pdf2docx pdf2docx copied to clipboard

Metadata

← Metadata

Owner

Metadata

pdf2docx
pdf2docx copied to clipboard