Support multimodality (Image, PDF) input
My usecase for EXO involves reading and parsing PDF's to extract data from it.
One of the requirements for this is multimodality with vision support. It would be great to have this supported in EXO through file upload.
Would you like support for any specific vision model? Would this be something you want control over to choose which vision model is used to parse the PDF?
Would be great to have!
Is anyone assigned/working on it?, if not I can give it a try. Sounds interesting to me.
Is anyone assigned/working on it?, if not I can give it a try. Sounds interesting to me.
Assigned. Best of luck.
Hello all, let me know further steps! @Evanev7 @AlexCheema