pdf2docx
pdf2docx copied to clipboard
Open source Python library for converting PDF to DOCX.
当pdf中包含流程图时候,会有如下情况 1. 会将流程图的框架转为图片,而文字在图片后面 2. 流程图整体转为一张图片,但在图片后面会有和图片相同的文字重叠在一起 如图:  右侧是原本的转换效果,左侧是拖动出的图  原本是重叠在一起的,拖动以展示效果 尝试调整过以下参数: ``` zh.convert(docx_file, start=0, end=None, pages=None, float_image_ignorable_gap=10, connected_border_tolerance=2, min_svg_gap_dx=30, min_svg_gap_dy=10, parse_stream_table=True) ```
转了几个英文pdf出现乱码,原因是windows中缺少pdf里的英文字体,word里就显示乱码了,有没有办法将pdf中的字体自动添加到系统中?(pdf中的字体可以识别,在编辑软件里可以看到字体种类)
Hello, I have a pdf and I want to purposefully ignore any images, charts and graphics during conversion. is it feasible with extra parameters ?
如图解析完pdf的页面,整个程序就卡死了。 Linux版本:Linux version 3.10.0-1160.95.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Mon Jul 24 13:59:37 UTC 2023
文件: [demo.pdf](https://github.com/ArtifexSoftware/pdf2docx/files/14061009/demo.pdf) 转换成pdf后,整定项目和更改前列转换都有问题:
Hello, I have noticed that when converting pdf files to docx using the pdf2docx library, the resulting docx file is missing the separators. Specifically, the lines that separate different sections...
Dear developer, please add more features for table formatting in the document. I am struggling to find a solution for it (Issue #238). Thanks.
Add more support for converting (word) equations in pdf to docx such as Latex