kleineanfragen
kleineanfragen copied to clipboard
Improve table recognition
Currently, table recognition is a simple check for some keywords in app/jobs/contains_table_job.rb.
We could improve this by looking for some obvious table like patterns like:
November 2013 43.104
Dezember 2013 30.419
Januar 2014 29.218
Februar 2014 15.598
(from https://kleineanfragen.de/berlin/17/14442)
Additionally we could use the table recognition from tabula: tabula-extractor / tabula-java