unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Results 188 unstructured issues
Sort by recently updated
recently updated
newest added

### Description This PR adds a number of enhancements around how we dynamically generate the requirements defined in the `setup.py` file. All logic around this was moved out into a...

**Describe the bug** ![recycle_python_modules](https://github.com/Unstructured-IO/unstructured/assets/3397714/b8e42572-9d7c-48ca-8bdb-e55575befd33) **To Reproduce** `pip install unstructured` **Expected behavior** only specific version should installed, not all versions

bug

Minor/partial refactor of interfaces.py and add tests

Just a tiny fix for a broken link that bothered me :)

**Problem** Chunk text begins mid-word when `overlap` is specified. ![image](https://github.com/Unstructured-IO/unstructured/assets/39398937/cc2393e3-ce78-4b3f-a541-b6c9d6854481) **Desired solution** Compute the overlap prefix as the next even-word boundary greater than or equal to `overlap` characters from the...

enhancement
chunking

### Description Currently wasn't compiling `base.in` first, which is required because others use the generated `.txt` file as a constraint.

Hello, Maybe this feature already exist but I didn't manage to implement it. I work on a network that blocks huggingface and I would like to run: `elements = partition_pdf(filename=PDF_PATH,...

enhancement

This PR is a clone of PR https://github.com/Unstructured-IO/unstructured/pull/2600 to run CI / test_chipper and update ingest test fixtures.

**Describe the bug** Got this error message following the MongoDB Destination Connector docs > TypeError: SimpleMongoDBConfig.__init__() got an unexpected keyword argument 'uri' **To Reproduce** From the docs: ``` def get_writer()...

bug

**Describe the bug** When I try to partition the PDF file using partition_pdf, it gives me the two error message given below - 1. Some images were not loaded. Check...

bug