pdf2docx 转换时遇到字体名为中文（比如“宋体”）时，发生错误

转换时遇到字体名为中文（比如“宋体”）时，发生错误

Open hlhtddx opened this issue 1 year ago • 2 comments

如题，转换时遇到字体名为中文（比如“宋体”）时，发生错误 bytes must be in range[0 to 255] 错误点在 https://github.com/ArtifexSoftware/pdf2docx/blame/master/pdf2docx/common/share.py#L128 当字体名称为中文时，ord(c)大于255，转换成bytes时会报错

def decode(s:str):
    '''Try to decode a unicode string.'''
    b = bytes(ord(c) for c in s) ### 这里出错
    for encoding in ['utf-8', 'gbk', 'gb2312', 'iso-8859-1']:
        try:
            res = b.decode(encoding)
            break
        except:
            continue
    return res

May 01 '24 10:05 hlhtddx

缺了一遍，只有在选择multiprocessing=True才会出现问题，单进程模式不会出问题

May 01 '24 10:05 hlhtddx

我也遇到了这个问题。

Sep 27 '24 03:09 RobertHoffman

pdf2docx pdf2docx copied to clipboard

转换时遇到字体名为中文（比如“宋体”）时，发生错误

pdf2docx
pdf2docx copied to clipboard