haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Unstructured File Converter: maintenance and refactoring

Open anakin87 opened this issue 9 months ago • 1 comments

This component was developed a while ago when the Unstructured ecosystem was smaller and simpler. It evolved over time and now includes: the open-source library, free and paid APIs, API clients, the Docker image (for running the API locally).

TODO

  • ~Ensure compatibility with unstructured-client>=0.30.0 (see #1416).~ This was magically fixed in #1841
  • Evaluate whether we can remove the dependency on the unstructured library. Initially, this was the only way to programmatically query self-hosted APIs, but we should explore if the client alone is sufficient.
  • Verify that our integration correctly works with APIs hosted by Unstructured (review URLs, etc...) or fix any issues.

anakin87 avatar Feb 18 '25 13:02 anakin87

One thing to have in mind also: renaming the paths argument to sources with the same type as other converters to be in line with what other converters expect :)

lambda-science avatar Mar 21 '25 08:03 lambda-science