pdf2docx unsupported colorspace for '{output}'

unsupported colorspace for '{output}'

Open itDjango opened this issue 1 year ago • 0 comments

Description of the bug

I'm encountering an error during parsing: "unsupported colorspace for '{output}'".

My requirement is that I cannot modify the original PDF file, so I need to address this issue within the parsing script itself.

I've noticed others have raised this issue as well, but it hasn't been resolved. How can I tackle this problem without altering the source code?

How to reproduce the bug

000.pdf Problem file

def to_docx(file_path):
    try:
        pdf_file = file_path
        word_file_path = file_path[:-4] + '.docx'
        docx_file = word_file_path
        start_time = time.time()
        cv = Converter(pdf_file)
        cv.convert(docx_file, start=0, end=None)
        cv.close()
        end_time = time.time()
        logger.info(start_time-end_time)
        return True
    except Exception as e:
       logger.error(f': {e}')
       return False

‘ERROR’ unsupported colorspace for '{output}'

pdf2docx version

0.5.8

Operating system

Linux

Python version

3.12

Dec 03 '24 08:12 itDjango

pdf2docx pdf2docx copied to clipboard

unsupported colorspace for '{output}'

Description of the bug

How to reproduce the bug

pdf2docx version

Operating system

Python version

pdf2docx
pdf2docx copied to clipboard