private-gpt
private-gpt copied to clipboard
Add support/mapping for Images, URLs and JSON
As an enhancement, leveraging the full power of the Unstructured package, I would like to see a mapping for images, JSON and URLS.
What you need to implement specifically is the UnstructuredImageLoader, UnstructuredURLLoader and the JSONLoader.
As support for multiple file types grows, I would also suggest replacing the different mappings with a more generic approach, using the DirectoryLoader or UnstructuredFileLoader directly.
I'd not rely on OCR, any OCR output should be supervised. The rest sounds good, though.
@robertgro you can try https://github.com/h2oai/h2ogpt or borrow code from there for prviateGPT. It has blip/blip2 for captioning and OCR recognition. Also support URLs.