pdf2docx icon indicating copy to clipboard operation
pdf2docx copied to clipboard

pdf转word显示ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Open gospider001 opened this issue 3 years ago • 0 comments

from pdf2docx import Converter
pdfPath="C:/迅雷下载/ex.pdf"
docxPath="C:/迅雷下载/ex.docx"
cli=Converter(pdfPath)
cli.convert(docxPath)
    cli.convert(docxPath)
  File "C:\Users\bai\Miniconda3\lib\site-packages\pdf2docx\converter.py", line 329, in convert
    self.parse(start, end, pages, **settings).make_docx(docx_filename, **settings)
  File "C:\Users\bai\Miniconda3\lib\site-packages\pdf2docx\converter.py", line 213, in make_docx
    docx_file.save(filename)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\document.py", line 135, in save
    self._part.save(path_or_stream)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\parts\document.py", line 111, in save
    self.package.save(path_or_stream)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\package.py", line 172, in save
    PackageWriter.write(pkg_file, self.rels, self.parts)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\pkgwriter.py", line 35, in write
    PackageWriter._write_parts(phys_writer, parts)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\pkgwriter.py", line 56, in _write_parts
    phys_writer.write(part.partname.rels_uri, part._rels.xml)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\rel.py", line 82, in xml
    rels_elm.add_rel(
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\oxml.py", line 218, in add_rel
    relationship = CT_Relationship.new(rId, reltype, target, target_mode)
  File "C:\Users\bai\Miniconda3\lib\site-packages\docx\opc\oxml.py", line 169, in new
    relationship.set('Target', target)
  File "src\lxml\etree.pyx", line 816, in lxml.etree._Element.set
  File "src\lxml\apihelpers.pxi", line 593, in lxml.etree._setAttributeValue
  File "src\lxml\apihelpers.pxi", line 1540, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

pdf2docx 0.5.6 #126 ex.pdf

gospider001 avatar Dec 09 '22 02:12 gospider001