pdftotree
pdftotree copied to clipboard
Inconsistent data models for bbox
Describe the bug
Data models that represent bounding boxes are inconsistent, which considerably degrades readability. For example,
bbox: List[float]
in the order of (y0, x0, y1, x1)
at
https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L447
bbox: Tuple[float]
in the order of (x0, y0, x1, y1)
at
https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L456
word[1:]: List[float]
in the order of (y0, x0, y1, x1)
at
https://github.com/HazyResearch/pdftotree/blob/6ff4a7cb5fe6269e3c287664392e226ca45479d4/pdftotree/TreeExtract.py#L458-L463
To Reproduce
N/A
Expected behavior
I expect that they are consistent.
Error Logs/Screenshots
N/A
Environment (please complete the following information):
-
pdftotree
Version: 6ff4a7cb5fe6269e3c287664392e226ca45479d4
Additional context Add any other context about the problem here.
See discussions at https://github.com/HazyResearch/pdftotree/pull/84#discussion_r502049405