Add support for Janus-Pro-7B and other
Currently Transformerlab only handles text. But the tendency for the future is converging towards Multimodal AI Models (MLLM). For example the recent Janus-Pro is a unified understanding and generation MLLM.
https://huggingface.co/deepseek-ai/Janus-Pro-7B
Please add support for images input and output, audio input and output, and video input and output. It would be ideal to be able to load a CLIP model and a Diffusion model, plus a VAE model, to generate images with models like Flux.
https://huggingface.co/black-forest-labs/FLUX.1-dev https://huggingface.co/vpakarinen/flux-dev-vae-clip/tree/main
Definitely something we are talking about on the roadmap...although maybe longer term. We are hoping to simplify and expand what users can create via plugins as well going forward.
For now, we do have basic multimodal model support for taking images in with models like LLaVA.
Hey @Emasoft, Just catching up on this. We now have support to load and interact with diffusion models within our platform. You can download one from Model Zoo or from huggingface using the search bar at the bottom of Model Zoo
@deep1401 That sounds great! I'm looking forward to it!
Hi @Emasoft, Our vLLM and Sglang plugins also support multimodal models now allowing image inputs you can check those out as well! Just closing this issue now but please open another issue if you need help or support!