unstructured
unstructured copied to clipboard
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
### Description This PR adds a number of enhancements around how we dynamically generate the requirements defined in the `setup.py` file. All logic around this was moved out into a...
**Describe the bug** ![recycle_python_modules](https://github.com/Unstructured-IO/unstructured/assets/3397714/b8e42572-9d7c-48ca-8bdb-e55575befd33) **To Reproduce** `pip install unstructured` **Expected behavior** only specific version should installed, not all versions
Minor/partial refactor of interfaces.py and add tests
Just a tiny fix for a broken link that bothered me :)
**Problem** Chunk text begins mid-word when `overlap` is specified. ![image](https://github.com/Unstructured-IO/unstructured/assets/39398937/cc2393e3-ce78-4b3f-a541-b6c9d6854481) **Desired solution** Compute the overlap prefix as the next even-word boundary greater than or equal to `overlap` characters from the...
### Description Currently wasn't compiling `base.in` first, which is required because others use the generated `.txt` file as a constraint.
Hello, Maybe this feature already exist but I didn't manage to implement it. I work on a network that blocks huggingface and I would like to run: `elements = partition_pdf(filename=PDF_PATH,...
This PR is a clone of PR https://github.com/Unstructured-IO/unstructured/pull/2600 to run CI / test_chipper and update ingest test fixtures.
**Describe the bug** Got this error message following the MongoDB Destination Connector docs > TypeError: SimpleMongoDBConfig.__init__() got an unexpected keyword argument 'uri' **To Reproduce** From the docs: ``` def get_writer()...
**Describe the bug** When I try to partition the PDF file using partition_pdf, it gives me the two error message given below - 1. Some images were not loaded. Check...