haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Unstructured: support `ByteStream` input in run method

Open vblagoje opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. Currently our UnstructuredFileConverter doesn't support ByteStream input in run method as other converters. This is a bit inconvenient for our users and also breaks the existing pattern (all converters do)

Describe the solution you'd like ByteStream input in run method is important because users can then inject resources from the Internet for example downloaded with LinkContentFetcher or similar components and passed to UnstructuredFileConverter for further processing.

Describe alternatives you've considered Have's considered any.

Additional context None

vblagoje avatar Sep 11 '24 09:09 vblagoje

I think this is also necessary to be able to link UnstructuredFileConverter to the FileTypeRouter from Haystack properly (without deactivating type checking when connectings components) because the FileTypeRouter can output ByteStream

lambda-science avatar Mar 21 '25 08:03 lambda-science

Yes that's right! There is an open PR, which is unfortunately stale and not ready to be merged yet so we'll need to pick up that work again.

julian-risch avatar Mar 26 '25 14:03 julian-risch

Closing as we now track refactoring this integration in issue https://github.com/deepset-ai/haystack-core-integrations/issues/1417

julian-risch avatar May 16 '25 11:05 julian-risch