pdftotree icon indicating copy to clipboard operation
pdftotree copied to clipboard

Inconsistent data models for bbox

Open HiromuHota opened this issue 4 years ago • 0 comments

Describe the bug

Data models that represent bounding boxes are inconsistent, which considerably degrades readability. For example,

bbox: List[float] in the order of (y0, x0, y1, x1) at

https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L447

bbox: Tuple[float] in the order of (x0, y0, x1, y1) at

https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L456

word[1:]: List[float] in the order of (y0, x0, y1, x1) at

https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L458-L463

To Reproduce

N/A

Expected behavior

I expect that they are consistent.

Error Logs/Screenshots

N/A

Environment (please complete the following information):

  • pdftotree Version: 6ff4a7cb5fe6269e3c287664392e226ca45479d4

Additional context Add any other context about the problem here.

See discussions at https://github.com/HazyResearch/pdftotree/pull/84#discussion_r502049405

HiromuHota avatar Oct 09 '20 20:10 HiromuHota