exo icon indicating copy to clipboard operation
exo copied to clipboard

Support multimodality (Image, PDF) input

Open RWL-Dittrich opened this issue 2 months ago • 6 comments

My usecase for EXO involves reading and parsing PDF's to extract data from it.

One of the requirements for this is multimodality with vision support. It would be great to have this supported in EXO through file upload.

RWL-Dittrich avatar Dec 24 '25 11:12 RWL-Dittrich

Would you like support for any specific vision model? Would this be something you want control over to choose which vision model is used to parse the PDF?

AlexCheema avatar Dec 24 '25 15:12 AlexCheema

Would be great to have!

Evanev7 avatar Dec 24 '25 19:12 Evanev7

Is anyone assigned/working on it?, if not I can give it a try. Sounds interesting to me.

rafipatel avatar Dec 30 '25 08:12 rafipatel

Is anyone assigned/working on it?, if not I can give it a try. Sounds interesting to me.

Assigned. Best of luck.

AlexCheema avatar Dec 30 '25 13:12 AlexCheema

Hello all, let me know further steps! @Evanev7 @AlexCheema

rafipatel avatar Jan 01 '26 19:01 rafipatel