Does distributed-llama currently support multimodal models?
Does distributed-llama currently support multimodal models? For example, llava.
I tried and found that it can run, but I can't make inferences based on pictures
In addition, do you need edge node device testing? We have a lot of idle edge nodes and can provide relevant assistance and support
Same question here.
I'd also like to know this, but am curious to how to managed to run a llava model under distributed-llama, @SherronBurtint - would you be able to share?
@cjastone I tried LLaVA based on LLaMA 3, and it can indeed be converted into .m model and run. However, I believe the conversion only covers the pure language model layers of LLaMA 3, essentially ignoring the vision encoder part (CLIP). So at the moment, multimodal support is not possible