pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

转换时遇到字体名为中文(比如“宋体”)时,发生错误

Open hlhtddx opened this issue 1 year ago • 2 comments

如题,转换时遇到字体名为中文(比如“宋体”)时,发生错误 bytes must be in range[0 to 255] 错误点在 https://github.com/ArtifexSoftware/pdf2docx/blame/master/pdf2docx/common/share.py#L128 当字体名称为中文时,ord(c)大于255,转换成bytes时会报错

def decode(s:str):
    '''Try to decode a unicode string.'''
    b = bytes(ord(c) for c in s) ### 这里出错
    for encoding in ['utf-8', 'gbk', 'gb2312', 'iso-8859-1']:
        try:
            res = b.decode(encoding)
            break
        except:
            continue
    return res

hlhtddx avatar May 01 '24 10:05 hlhtddx

缺了一遍,只有在选择multiprocessing=True才会出现问题,单进程模式不会出问题

hlhtddx avatar May 01 '24 10:05 hlhtddx

我也遇到了这个问题。

RobertHoffman avatar Sep 27 '24 03:09 RobertHoffman