feat: Support bytestream in Unstructured API
Related Issues
- fixes #1075
Proposed Changes:
Unstructured API can also be called with Bytestream now in addition of Path.
How did you test it?
Added extra unit tests
Notes for the reviewer
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.
@vblagoje , I try to address your request in this PR.
@alperkaya can you please sign the CLA? Otherwise we can't merge this. :)
@alperkaya can you please sign the CLA? Otherwise we can't merge this. :)
done ;)
Hi @silvanocerza,
I try to address your comment by checking the existing converters.
This version covers these cases without losing meta fields.
Case 1: Files with Meta as None Case 2: ByteStreams with Meta as None Case 3: Files with Meta as a Dictionary Case 4: ByteStreams with Meta as a Dictionary Case 5: Files with Meta as a List Case 6: ByteStreams with Meta as a List Case 7: Directory with Meta as a Dictionary Case 8: Directory with Meta as a List (Should Fail) Case 9: Combination of File Paths, ByteStreams, and Directory with Meta as a Dictionary
This is still not working as expected, I strongly suggest you copy the implementation from core.
This is still not working as expected, I strongly suggest you copy the implementation from core.
Hi, in the core repo, the solution handles either file paths or bytestreams. However, in this PR, I’m managing not just file paths and bytestreams, but also directories, where I need to fetch all files without entering subfolders. Given these additional requirements, could we explore how to modify the core solution to meet this use case, or would an alternative approach be better suited here?
Hey @alperkaya apologies that this has stalled for so long. I'd be happy to pick this up and help you get it over the finish line. Is this something you are still working on?