unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Results 188 unstructured issues
Sort by recently updated
recently updated
newest added

related to issue #2664 Not at all confident with the second commit. I ran the make command in a new python env, but somehow, a lot of things seem to...

First of all, really cool software 💯 While doing a license check, I noticed that the `pillow-heif` dependency is actually GPLv2 with the binary wheels. Source: https://github.com/bigcat88/pillow_heif/issues/111 I think we...

This minor change updates the URL of the [Weaviate Docker image](https://weaviate.io/developers/weaviate/installation/docker-compose). ​Instead of the standard Docker registry, Weaviate now makes use of a custom registry running at `cr.weaviate.io`. Thanks in...

Add support for detecting table caption tags within tables and by themselves. More on caption tags: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/caption

needs follow up

**Summary** The indent-level for a bullet in DOCX is stored in the XML as an `int`. However, Word is tolerant of a floating-point value in that field and does not...

bug
docx

**Describe the bug** I came across a webpage which is being detected as a CSV file. It should be detected as html. The page, unfortunately, returns its content type as:...

bug
auto

The current Platform Documentation listed below does not mention required permissions for the Google Cloud service account keys. _Requested changes:_ On the Google Cloud Service source connector documentation https://unstructured-io.github.io/unstructured/platforms/platform_sources/google_cloud_source.html, can...

enhancement

**Describe the bug** Unable to run [unstructured chunking](https://unstructured-io.github.io/unstructured/core/chunking.html#calling-a-chunking-function). I'm getting PDFPageCountError. **To Reproduce** Same as above **Expected behavior** Run smoothly **Screenshots** If applicable, add screenshots to help explain your problem....

bug
pdf

Adds a src and dest connector for Kafka

**Describe the bug** Getting an error when using unstructured + langchain. Only happens in 0.12.6. Cannot repro in 0.12.5. The error: ``` 55 IS_PYSTON = hasattr(sys, "pyston_version_info") 56 HAS_REFCOUNT =...

bug