unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Results 188 unstructured issues
Sort by recently updated
recently updated
newest added

**Describe the bug** When partitioning [this](https://github.com/Unstructured-IO/unstructured/files/15109052/a1977-backus.pdf) PDF document with the `fast` strategy, the following `KeyError` occurs: ``` { "name": "KeyError", "message": "'782eec119b3409ea1a0bc8abf8f059ac'", "stack": "--------------------------------------------------------------------------- KeyError Traceback (most recent call last)...

bug

**Describe the bug** I am evaluating the UnstructuredClient for processing PDF documents and am encountering an issue with the Greek language text extraction. When I attempt to extract text from...

bug
ocr

The pinned version of unstructured-client was changed from `>=0.15.1` to `

bug

Make chroma ingest pipeline idempotent :) @potter-potter

Allow users to set additional metadata values to expand on metadata filtering capabilities. Useful to narrow down the search scope with metadata filters. cc @potter-potter https://cookbook.chromadb.dev/core/filters/#metadata-filters

GPU is not utilized during the process!

bug

**Is your feature request related to a problem? Please describe.** I need to be able to extract additional metadata from HTML documents. Specifically I would like to extract favicons and...

enhancement
html

**Describe the bug** A list index out of range occurs in _convert_table_to_text during docx parsing. **To Reproduce** I was operating on 1360 docx files from this source: https://www.3gpp.org/ftp/Specs/latest/Rel-17 In the...

bug
docx

Unstructured doesn't currently retain markdown image links (like [this format](https://www.codecademy.com/resources/docs/markdown/images)). User wants to do document loading through Langchain with Unstructured and keep markdown image links.

enhancement

When trying to load json file using S3FileLoader which uses Unstructured to load files, it's showing this error : ValueError: Detected a JSON file that does not conform to the...