OTR icon indicating copy to clipboard operation
OTR copied to clipboard

How to restore a table?

Open hoslo opened this issue 6 years ago • 3 comments

out How can I restore a table with python-docx based on this image?

hoslo avatar Jan 07 '19 07:01 hoslo

Sorry for taking so long to respond.

OTR (at the current state) can only recognize the tables (coordinates, cells) and assigns them coordinates, but does not do anything beyond that.

Here's what you want to do:

  • Do OCR on the cells (e.g. using Tesseract)
  • Find some way of representing the table structure in python-docx. I don't know enough about python-docx to tell you any details right now.
  • Insert the OCR result into the cells
  • Merge cells if applicable

ulikoehler avatar Jul 16 '19 17:07 ulikoehler

@ulikoehler Hi, ulikoehler,I have looked at the coordinates on the image for a long time,like(2,1),(2,4),(2,5),where is (2,2,),(2,3).... I don't understand their meaning. Can you explain what they represent?

csaimd avatar Sep 21 '19 03:09 csaimd

@csaimd These coordinates represent (column, row). As you can see in the image listed above, this is based on a somewhat crude estimation algorithm (which is AFAIK based on the center coordinates of cells).

So, (2,3) is be beneath (2,2).

Does that help you?

ulikoehler avatar Sep 22 '19 21:09 ulikoehler