unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

tables to dataframe

Open philip-shinra opened this issue 1 year ago • 2 comments

how to convert tables into dataframe ?

philip-shinra avatar Nov 24 '23 18:11 philip-shinra

seems it is not the function of unstructured, but, as unstructured also use table transformer project, you can refer microsoft table transformer project, or this article: https://medium.com/@lidores98/image-table-to-dataframe-using-python-ocr-773c8afb713d Hope this can help

jojogh avatar Nov 26 '23 03:11 jojogh

If you are working with unstructured output, a Table element has a metadata.text_as_html field which you could read into a pandas dataframe (google "html to pandas dataframe").

cragwolfe avatar Nov 26 '23 18:11 cragwolfe

If you are working with unstructured output, a Table element has a metadata.text_as_html field which you could read into a pandas dataframe (google "html to pandas dataframe").

i am receiving a KeyError while trying to access text_as_html. Can you please provide the code or any help regarding this?

shriharshan avatar May 09 '24 11:05 shriharshan

@shriharshan please open a new issue for your problem as it is not strictly related to the original post.

Provide a snippet of how you call the partitioning function and the full stack trace you receive when you get the KeyError. Also, make sure you're using the latest version of unstructured.

An ElementMetadata object will never raise KeyError on accessing element.metadata.text_as_html. However accessing the dict form of an element could. You'll need to use element["metadata"].get("text_as_html") in that case and properly handle the possible None case.

Closing original post as resolved.

scanny avatar May 09 '24 18:05 scanny