camelot
camelot copied to clipboard
Feature request: Expose PDFMiner text objects
It would be useful if it was possible, when using the high level API, to access original PDF Miner text and/or char object(s). This would allow us to e.g. analyze text color and other things. I.e. something like this:
>>> tables = camelot.read_pdf("demo.pdf")
>>> cell = tables[0].cells[0][0]:
>>> LTCharList = cell._characters
>>> LTCharList[0]
<LTChar 190.551,36.641,196.089,46.601 matrix=[9.96,0.00,0.00,9.96, (190.55,39.88)] font='OELWGB+ArialMT' adv=0.556 text='6'>
>>>
edit: Sorry for the bug label, I probably followed the wrong link for creating this issue, and I can't figure out how to remoce it...