pdf2docx
pdf2docx copied to clipboard
Accuracy checker
Is your feature request related to a problem? Please describe.
I need to convert PDF files to Word without using APIs due to cost constraints. I want to use Python libraries for this task but need to ensure the accuracy of the conversion.
Describe the solution you'd like
I would like to develop an automated system that evaluates the accuracy of different PDF to Word conversion methods using Python libraries. The system should identify and use the most accurate method.
Describe alternatives you've considered
- Using various Python libraries such as pdf2image, pdfplumber, PyMuPDF, and camelot.
- Manually comparing the output of different methods to determine accuracy.
- Exploring other open-source tools that might offer better accuracy.
Additional context
- Screenshots of the current conversion results.
- Examples of PDFs and their expected Word outputs.
- Any specific requirements for maintaining layout and formatting.
- RTL languages not supported: I need to check if the document contains RTL (Right-to-Left) languages during preprocessing.