camelot icon indicating copy to clipboard operation
camelot copied to clipboard

Unknown Flavor, excaliber json incompatible with camelot

Open bosd opened this issue 1 year ago • 0 comments

Describe the bug When using the Excalibur to create JSON rules, it is impossible to use them in the **kwargs of camlot cli.

Steps to reproduce the bug

  1. Upload a file to Excalibur 2 Generate extraction rules in excalibur 3 Go to the Excalibur Rule manager. Download or copy the JSON rule Example:
 {"pages":{"1":{"table_areas":["83.39951690821256,619.54177384945,509.9832528180354,466.19127305596726"],"columns":null}},"flavor":"Stream","process_background":false,"line_size_scaling":15,"split_text":false,"flag_size":false}

4 Call camelot fro the cli with the previously generated JSON rule as kwargs.

This results in:

camelot.read_pdf(invoicefile, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/camelot/io.py", line 103, in read_pdf
    "Unknown flavor specified." " Use either 'lattice' or 'stream'"
NotImplementedError: Unknown flavor specified. Use either 'lattice' or 'stream'

Seems that Excalibur outputs Lattice and Stream. (Note the uppercase) Camelot CLI expects the flavors to be in lowercase.

So the workaround is simple. Change the Capitol letter of flavor name in the Excaliber JSON to lowercase. It's just not so good partice to do in real life workflow.

Expected behavior Expected behaviour is that one could use Excaliber to generate rules. Use an external program in your ETL pipline to call camelot CLI with the setting generator with Excalibur.

Code https://github.com/camelot-dev/camelot/blob/master/camelot/io.py    line 101

import camelot

# add your code here

PDF

Screenshots

Environment

  • OS: ubuntu
  • Python version: 3.7
  • Camelot version: 0.10.1
  • Excalibur version: ??

Additional context

bosd avatar Aug 12 '22 21:08 bosd