[Question]: Which local language models perform best with fabric?
Hello everyone,
I'm curious about which local language model provides the best results for working with fabric. Currently, I'm using the llama3 model, but I'm interested in knowing if there are more suitable alternatives that could potentially yield better outcomes. I've also tried aya23 8B, phi3 mini, and mistral 7B.
Additionally, if I import a custom model into Ollama, which quantization level (Q3, Q4, Q5, Q6, Q8) would deliver the best performance? I understand that higher quantization levels typically offer greater accuracy and better performance, though they also demand more resources. However, is there really that significant of a difference in practice?
Which model has worked exceptionally well for fabric and other related tasks?
Thank you for your insights!
That is hard to answer, esp since it largely depends on your hardware...and the project moves so fast that "pinning" down an answer would likely change after a few updates....also language models get updated too. So - the best thing for you to do is benchmark yourself against the things that you would be interested in using it for.
If you are just looking for how quickly a model runs on a piece of local hardware (and not how well it response), I used LM Studio to show me the tokens/second for my hardware.
- start LM Studio
- download the models you want to test
- go to "AI Chat"
- click on 'select a model to load' at the top and select a model
- click on "New Chat'
- enter something you want to process through the LLM (I used 'what are embeddings and how do they work?')
- at the bottom of the chat window, you will see "speed xx tok/s"
- create a new chat and goto to step 4 above
Each chat will show a different model with its speed on your hardware (remember to use the same prompt). It's not perfect and there is probably a better way but still useful.
Note: adjust settings on the right for each model if required (e.g. GPU offload)