unstructured
unstructured copied to clipboard
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
**Describe the bug** When anything other than basic auth (username/password) is used, the client instantiation breaks because the username field from the access config isn't dropped, causing the following error:...
**Describe the bug** Running the `unstructured-ingest` cli command and it is hanging. I think that it is treating the root page as a Page Block and trying to parse it,...
Removes this warning: > Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use...
**Describe the bug** I'm calling `get_elements_from_api()` function with the following arguments. ``` file = # downloaded from the AWS S3 documents = get_elements_from_api( file_path=None, api_key='', api_url='', file=file, # BytesIO type....
**Is your feature request related to a problem? Please describe.** Enable the ability to toggle GPU acceleration, which is already built into Tessract. It would be ideal if we could...
cfr discussion at #2362
**Describe the bug** The results of extracting table information from the attached [acciona.pdf](https://github.com/Unstructured-IO/unstructured/files/14388281/acciona.pdf) file are underwhelming whereas the results of OCR via `tesseract` and `pdfminer` on the whole page are...
hi, I'm using version 0.11.8. I use the following code to execute partition_doc : ``` from unstructured.partition.doc import partition_doc filename = ""D:\\Testcase\\test.doc"" elements = partition_doc(filename=filename) ``` However, I encountered a...
**Describe the bug** --------------------------------------------------------------------------- TypeError Traceback (most recent call last) [](https://localhost:8080/#) in () 6 7 # Initialize the encoder with OpenAI credentials ----> 8 embedding_encoder = OpenAIEmbeddingEncoder(api_key=open_ai_api_key) TypeError: OpenAIEmbeddingEncoder.__init__() got...
**Is your feature request related to a problem? Please describe.** I am heavy conda user and I like unstructured so far but relying on pip sometimes makes me look for...