unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Results 188 unstructured issues
Sort by recently updated
recently updated
newest added

This pull request allows to return predictions in raw cell representation from table transformer. It will be later used to save prediction in a cells format for simpler metrics calculation....

Hi, Please find the below URL: https://unstructured-io.github.io/unstructured/ingest/source_connectors/wikipedia.html The explanation says "Connect **Airtable** to your preprocessing pipeline, and batch process all your documents using unstructured-ingest to store structured outputs locally on...

documentation

This pull request add metrics that are calculated based on table_as_cells instead of text_as_html. This change is required for comprehensive metrics calculation, as previously every colspan or rowspan predicted was...

**Describe the bug** pip install unstructured[pdf] results in ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects **To Reproduce** % pip install unstructured[pdf] **Expected behavior**...

bug

**Describe the bug** Some spaces are removed from the text when partitioning a PDF document. **To Reproduce** PDF: [rok_20230930_1-1.pdf](https://github.com/Unstructured-IO/unstructured/files/15001636/rok_20230930_1-1.pdf) ``` elements = partition_pdf( filename="rok_20230930_1-1.pdf", strategy="hi_res", infer_table_structure=True, ) print(str(elements[20])) ``` **Current...

bug
pdf

When attempting to execute `partition_doc` to pre-process multiple documents at the same time it fails by throwing the following error: ` PackageNotFoundError: Package not found at '/var/folders/p5/dljg1qv95y97dyq1c38xgb6r0000gn/T/tmp3nwg1qob/test.docx'` Here is a...

investigating

**Describe the bug** When I am importing the modules as below, I am getting the following error- ``` from unstructured.partition.html import partition_html from unstructured.partition.pptx import partition_pptx ``` `TypeError: add_chunking_strategy() missing...

bug

**Describe the bug** Cannot use unstructured on MacOS M2 Pro because `from unstructured.partition.html import partition_html` throws ``` Traceback (most recent call last): File "", line 1, in File "/Users/liamvdv/src/github.com/REDACT/.venv/lib/python3.9/site-packages/unstructured/partition/html.py", line...

bug
packaging

I am trying to use the Unstructured library locally using the Python 3.10.2 version. Everytime I try to import unstructured.partition.something, for example "from unstructured.partition.pdf import partition_pdf" the kernel dies. I...

bug