camelot icon indicating copy to clipboard operation
camelot copied to clipboard

Feature request: Expose PDFMiner text objects

Open rotsee opened this issue 4 years ago • 0 comments

It would be useful if it was possible, when using the high level API, to access original PDF Miner text and/or char object(s). This would allow us to e.g. analyze text color and other things. I.e. something like this:

>>> tables = camelot.read_pdf("demo.pdf")
>>> cell = tables[0].cells[0][0]:
>>> LTCharList = cell._characters
>>> LTCharList[0]
<LTChar 190.551,36.641,196.089,46.601 matrix=[9.96,0.00,0.00,9.96, (190.55,39.88)] font='OELWGB+ArialMT' adv=0.556 text='6'>
>>> 

edit: Sorry for the bug label, I probably followed the wrong link for creating this issue, and I can't figure out how to remoce it...

rotsee avatar Oct 26 '20 21:10 rotsee