Investigate open source models
This project looks amazing!! Unfortunately I cannot self host it without paying for OpenAI models - is there a possibility that you could look into alternative open-source models utilizing ggml and/or llama.cpp? So far I have found LLaVa but I'm not sure if it'd work for this project.
Yes! definitely would like to support LLaVa. Don't think it's going to be as good but worth supporting. Related: https://github.com/abi/screenshot-to-code/issues/15
if we use open source models, it would be a very "satisfying" feeling that the whole project is independent, and does not have a financial roadblock [to an extent] so yes, what kind of models would be suitable for this?
LLava, CogVLM are worth experimenting with.
Maybe the image processing can be done by LLaVA v1.5, but the code generation can be passed on DeepSeek Coder?
I decided to merge #62 locally and I tried self-hosting the backend/frontend along with llava and python3 -m llama_cpp.server --model ./2ab9be51b7dc737136b38093316a4d3577d1fb96281f1589adac7841f5b81c43 --clip_model_path ./mmproj.gguf --chat_format llava-1-5 --n_gpu_layers 35. I specified the openai base url as http://localhost:8000/v1 and it seems to work 🚀
The issues I encountered were
- I'll have to experiment more but the end result obviously isn't as good as OpenAI/GPT 3/4.
- I need a gpu with more VRAM to be able to run larger versions of LLava - 13gb is needed for the larger model and I only have 8gb vram.
nice! can you share some results? screenshot and clone pairs.
also, you could try running llava 13gb with open router: https://openrouter.ai/models/haotian-liu/llava-13b?tab=stats
Also worth trying: https://twitter.com/Teknium1/status/1731369031918293173
Will this work? https://github.com/vikhyat/moondream
we test the effect of cogvlm2 is perfect