OTR
OTR copied to clipboard
How to restore a table?
How can I restore a table with python-docx based on this image?
Sorry for taking so long to respond.
OTR (at the current state) can only recognize the tables (coordinates, cells) and assigns them coordinates, but does not do anything beyond that.
Here's what you want to do:
- Do OCR on the cells (e.g. using Tesseract)
- Find some way of representing the table structure in python-docx. I don't know enough about python-docx to tell you any details right now.
- Insert the OCR result into the cells
- Merge cells if applicable
@ulikoehler Hi, ulikoehler,I have looked at the coordinates on the image for a long time,like(2,1),(2,4),(2,5),where is (2,2,),(2,3).... I don't understand their meaning. Can you explain what they represent?
@csaimd These coordinates represent (column, row). As you can see in the image listed above, this is based on a somewhat crude estimation algorithm (which is AFAIK based on the center coordinates of cells).
So, (2,3) is be beneath (2,2).
Does that help you?