pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

Open source Python library for converting PDF to DOCX.

Results 122 pdf2docx issues
Sort by recently updated
recently updated
newest added

如题,转换时遇到字体名为中文(比如“宋体”)时,发生错误 bytes must be in range[0 to 255] 错误点在 https://github.com/ArtifexSoftware/pdf2docx/blame/master/pdf2docx/common/share.py#L128 当字体名称为中文时,ord(c)大于255,转换成bytes时会报错 ```python def decode(s:str): '''Try to decode a unicode string.''' b = bytes(ord(c) for c in s) ### 这里出错 for...

input needed

Is there any support for ANDROID? or, How can I import this library? Need suggestion or documentation if have any. Thanks guys!

``` File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPage.py", line 67, in restore raw_dict = self.extract_raw_dict(**settings) File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPageFitz.py", line 33, in extract_raw_dict image_blocks = self._preprocess_images(**settings) File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPageFitz.py", line 118, in _preprocess_images return ImagesExtractor(self.page_engine).extract_images(settings['clip_image_res_ratio']) File "/usr/local/lib/python3.7/site-packages/pdf2docx/image/ImagesExtractor.py", line...

I color the lines in pdf for the entire size of the sheet, everything is colored in pdf format. I need to convert to docx format, but when converting, the...

enhancement

我试了好几个pdf,但都存在页面超出的问题。 比如 ![原样式](https://github.com/ArtifexSoftware/pdf2docx/assets/154854888/a812d62c-6e61-428a-b8a9-54ae39bfcc50) ![页面超出](https://github.com/ArtifexSoftware/pdf2docx/assets/154854888/dd3ac2dd-084f-4a17-b230-0cf0138684ca) 这要如何解决呢?可不可以通过设置一个判定,如果文本超过了bbox,就将文字的size缩小呢?文字自动改变大小以适应框的大小,即牺牲文字的样式而保留整体的布局。这是我的一个想法,不知道可不可行。

bug

I've encountered an issue with paragraph splitting in some documents, where certain pages separate sentences in the same paragraph into different text blocks while others do not. Upon investigation, I...

在识别pdf中发现存在两个问题, 1 无法在docx文件中还原 pdf文件中的隐藏表格的一部分显示线段, 比如样本中的红线是一个表格的一条框线。 2 文字段落无法实现首行缩进 样本如下图: ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/35327931/48b6e97c-2d70-4c6f-a211-6bbd904418cc) [zf1.pdf](https://github.com/ArtifexSoftware/pdf2docx/files/14913074/zf1.pdf)

wontfix
upstream bug

I'm getting an output that isn't accurate. Some images aren't on the same space as the original PDF. Here is a sample: Before: ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/89592598/ad05390c-3142-4046-9328-24d2b4955646) After: ![image](https://github.com/ArtifexSoftware/pdf2docx/assets/89592598/88675742-ebef-496a-88ad-5404d6ac7d4b) There image multiplied by...

the log is as below: [INFO] [1/4] Opening document... [INFO] [2/4] Analyzing document... unsupported colorspace for '{output}'