Multi-Modal model support

Open doberst opened this issue 1 year ago • 0 comments

We are very interested in integrating open source self-hosted multi-modal models into LLMWare. We have been watching the space closely and looking for ideas and contributions for supporting open source multi-modal models that work in conjunction with RAG and Agent-based automation pipelines.

Our key criteria is that there must be a use case related to some business objective (e.g., not just image generation), the model needs to work reasonably well, and should be self-hostable (e.g., max of 10-15B parameters).

To implement, the key focus will be the construction of a new MultiModal model class, and design of the preprocessor and postprocessors required to handle the multi-modal content, along with support for the underlying model packaging (e.g., GGUF, Pytorch, ONNX, OpenVino). We would look to collaborate and will support the underlying inferencing technology required.

Oct 04 '24 13:10 doberst