haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

feat: Support bytestream in Unstructured API

Open alperkaya opened this issue 1 year ago • 8 comments

Related Issues

  • fixes #1075

Proposed Changes:

Unstructured API can also be called with Bytestream now in addition of Path.

How did you test it?

Added extra unit tests

Notes for the reviewer

Checklist

alperkaya avatar Sep 12 '24 12:09 alperkaya

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 12 '24 12:09 CLAassistant

@vblagoje , I try to address your request in this PR.

alperkaya avatar Sep 12 '24 12:09 alperkaya

@alperkaya can you please sign the CLA? Otherwise we can't merge this. :)

silvanocerza avatar Sep 12 '24 12:09 silvanocerza

@alperkaya can you please sign the CLA? Otherwise we can't merge this. :)

done ;)

alperkaya avatar Sep 12 '24 12:09 alperkaya

Hi @silvanocerza,

I try to address your comment by checking the existing converters.

This version covers these cases without losing meta fields.

Case 1: Files with Meta as None Case 2: ByteStreams with Meta as None Case 3: Files with Meta as a Dictionary Case 4: ByteStreams with Meta as a Dictionary Case 5: Files with Meta as a List Case 6: ByteStreams with Meta as a List Case 7: Directory with Meta as a Dictionary Case 8: Directory with Meta as a List (Should Fail) Case 9: Combination of File Paths, ByteStreams, and Directory with Meta as a Dictionary

alperkaya avatar Sep 13 '24 10:09 alperkaya

This is still not working as expected, I strongly suggest you copy the implementation from core.

silvanocerza avatar Sep 16 '24 10:09 silvanocerza

This is still not working as expected, I strongly suggest you copy the implementation from core.

Hi, in the core repo, the solution handles either file paths or bytestreams. However, in this PR, I’m managing not just file paths and bytestreams, but also directories, where I need to fetch all files without entering subfolders. Given these additional requirements, could we explore how to modify the core solution to meet this use case, or would an alternative approach be better suited here?

alperkaya avatar Sep 17 '24 09:09 alperkaya

Hey @alperkaya apologies that this has stalled for so long. I'd be happy to pick this up and help you get it over the finish line. Is this something you are still working on?

sjrl avatar Feb 14 '25 14:02 sjrl