pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

Open source Python library for converting PDF to DOCX.

Results 122 pdf2docx issues
Sort by recently updated
recently updated
newest added

比如支持 正文、一级标题、二级标题 的自动识别

feature

当pdf中包含流程图时候,会有如下情况 1. 会将流程图的框架转为图片,而文字在图片后面 2. 流程图整体转为一张图片,但在图片后面会有和图片相同的文字重叠在一起 如图: ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/37822176/8a85abf0-7f8b-4409-a073-1693c2988901) 右侧是原本的转换效果,左侧是拖动出的图 ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/37822176/1187e004-ca93-4020-bffb-0ac08b67ff27) 原本是重叠在一起的,拖动以展示效果 尝试调整过以下参数: ``` zh.convert(docx_file, start=0, end=None, pages=None, float_image_ignorable_gap=10, connected_border_tolerance=2, min_svg_gap_dx=30, min_svg_gap_dy=10, parse_stream_table=True) ```

question

转了几个英文pdf出现乱码,原因是windows中缺少pdf里的英文字体,word里就显示乱码了,有没有办法将pdf中的字体自动添加到系统中?(pdf中的字体可以识别,在编辑软件里可以看到字体种类)

Hello, I have a pdf and I want to purposefully ignore any images, charts and graphics during conversion. is it feasible with extra parameters ?

enhancement

如图解析完pdf的页面,整个程序就卡死了。 Linux版本:Linux version 3.10.0-1160.95.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Mon Jul 24 13:59:37 UTC 2023

文件: [demo.pdf](https://github.com/ArtifexSoftware/pdf2docx/files/14061009/demo.pdf) 转换成pdf后,整定项目和更改前列转换都有问题:

Hello, I have noticed that when converting pdf files to docx using the pdf2docx library, the resulting docx file is missing the separators. Specifically, the lines that separate different sections...

feature
question

Dear developer, please add more features for table formatting in the document. I am struggling to find a solution for it (Issue #238). Thanks.

Add more support for converting (word) equations in pdf to docx such as Latex

enhancement