excalibur
excalibur copied to clipboard
Unstructured Data
Hi team, the camelot and excalibur is a great tool for extracting data from pdf but sometimes I get unstructured data.
Please give me some suggestion or a way to handle this type of problem below is the attachment you can see
so here the instrument type is nestle india and industry type is consumer non durables it takes the Durables as an extra cell
Please i request you to provide me some solution to overcome this problem.
Thank you so much for making this library and tool.
@vinayak-mehta Guess ML tools would help with such unstructured data. Thoughts?
https://djajafer.medium.com/pdf-table-extraction-with-keras-retinanet-173a13371e89
Yeah right now Camelot can't group rows together when there are no lines present. Adding support for horizontal line separators on the frontend or trying out ML might be some solutions, but it might take some time before I can do those experiments. @rajshah1997 If you want to give those solutions a try, please go ahead :)