pdf2docx issues

以后是否支持在保存word的时候，将文本的标题样式也进行保存？

4

比如支持正文、一级标题、二级标题的自动识别

WangxuP

feature

pdf中的流程图转word的问题

6

当pdf中包含流程图时候，会有如下情况 1. 会将流程图的框架转为图片，而文字在图片后面 2. 流程图整体转为一张图片，但在图片后面会有和图片相同的文字重叠在一起如图： ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/37822176/8a85abf0-7f8b-4409-a073-1693c2988901) 右侧是原本的转换效果，左侧是拖动出的图 ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/37822176/1187e004-ca93-4020-bffb-0ac08b67ff27) 原本是重叠在一起的，拖动以展示效果尝试调整过以下参数： ``` zh.convert(docx_file, start=0, end=None, pages=None, float_image_ignorable_gap=10, connected_border_tolerance=2, min_svg_gap_dx=30, min_svg_gap_dy=10, parse_stream_table=True) ```

UchihaArk

question

可不可以将pdf的字体自动添加到系统中以防止转换后乱码

转了几个英文pdf出现乱码，原因是windows中缺少pdf里的英文字体,word里就显示乱码了，有没有办法将pdf中的字体自动添加到系统中？（pdf中的字体可以识别，在编辑软件里可以看到字体种类）

gkngkngkn

Documentation: adds a header button to pymupdf project.

jamie-lemon

Ignore charts and images during conversion

1

Hello, I have a pdf and I want to purposefully ignore any images, charts and graphics during conversion. is it feasible with extra parameters ?

kallelUST

enhancement

设置multi_processing=True，在Linux上会程序卡死

1

如图解析完pdf的页面，整个程序就卡死了。 Linux版本：Linux version 3.10.0-1160.95.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Mon Jul 24 13:59:37 UTC 2023

sunny6chen

单元格内表格转换的文本错误

文件： [demo.pdf](https://github.com/ArtifexSoftware/pdf2docx/files/14061009/demo.pdf) 转换成pdf后，整定项目和更改前列转换都有问题：

qktechies

Missing separators when converting pdf to docx

2

Hello, I have noticed that when converting pdf files to docx using the pdf2docx library, the resulting docx file is missing the separators. Specifically, the lines that separate different sections...

wwaguai