pdf_to_docx
pdf_to_docx copied to clipboard
merge doc in a wrong way
In the file “pdf_recovery_doc.py”, function "merge_docx_v1" and “merge_docx_v2” use sort() simply like
docx_files_list = sorted(docx_files_list)
I don't think this can achieve the expected results, the correct way may be
import re
docx_files_list = sorted(
docx_files_list,
key=lambda x: int(re.search(r'_(\d+)_', x).group(1))
)
If possible, I can submit a pull request later
ok