pdf_to_docx
pdf_to_docx copied to clipboard
ocr,pdf转docx,pdf to docx
In the file “pdf_recovery_doc.py”, function "merge_docx_v1" and “merge_docx_v2” use sort() simply like ```python docx_files_list = sorted(docx_files_list) ``` I don't think this can achieve the expected results, the correct way may...
File "C:\Users\Lenovo\AppData\Roaming\Python\Python312\site-packages\paddleocr\ppstructure\table\predict_table.py", line 118, in _structure structure_res, elapse = self.table_structurer(copy.deepcopy(img)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Lenovo\AppData\Roaming\Python\Python312\site-packages\paddleocr\ppstructure\table\predict_structure.py", line 140, in __call__ self.predictor.run() ValueError: In user code: InvalidArgumentError: The shape of input[0] and input[1] is...
AttributeError: 'Document' object has no attribute 'pageCount' Traceback (most recent call last): File "C:\Users\Chenlz\Desktop\pdf_to_docx\src\pdf_doc\pdf_recovery_doc.py", line 121, in pdf2doc(pdf_path) File "C:\Users\Chenlz\Desktop\pdf_to_docx\src\pdf_doc\pdf_recovery_doc.py", line 35, in pdf2doc pdf_image(pdf_file, image_path) File "C:\Users\Chenlz\Desktop\pdf_to_docx\src\util\image_process.py", line 15,...