unstructured
unstructured copied to clipboard
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Part two of: https://github.com/Unstructured-IO/unstructured/pull/2842 Main changes compared to part one: * hash computation includes element's position in a sequence of all elements * there are more test for deterministic behavior...
This PR updates enhances the `TableAlignment.get_element_level_alignment` function so it can use different kinds of rapidfuzz functions to evaluate content accuracy. The default is set to `partial_token_ratio`.
This PR: - Decouples dev changelogs for each PR to be able to use merge queues with PRs that have conflicting changelogs. - Adds related automation scripts - Adds tests...
## Description The link to "Staging" was referring to a localhost URL. I've replaced it with the correct one.
**Describe the bug** Import Statement is taking forever to execute: I have tried to import from unstructured.partition.pdf import partition_pdf and the import statement is taking forever to execute. I am...
Hello Everyone!!, I am trying to setup unstructured on google colab I am facing a "FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/unstructured/nlp/english-words.txt'" **Code is as below ! pip...
**Is your feature request related to a problem? Please describe.** The api can now return extracted images in the response. Let's mirror the library functionality, and allow the user to...
**Describe the bug** The partition_pdf function errors with segmentation fault when infer_table_structure=True **To Reproduce** Follow the docker instructions here: https://unstructured-io.github.io/unstructured/installation/docker.html from unstructured.partition.pdf import partition_pdf elements = partition_pdf(filename="example-docs/layout-parser-paper-with-Table.pdf", infer_table_structure=True) **Expected behavior**...