Wouter Tichelaar
Wouter Tichelaar
You need to point it at the original folder of Meta-Llama-3-8B-Instruct not the one with safetensors but with the pth files, I already converted mine earlier today. I'll upload it...
It's busy uploading, you can find it here: https://huggingface.co/Azamorn/Meta-Llama-3-8B-Instruct-Distributed
Could you give more information, what repo, what folder you pointing it at etc? I should probably say it again, it doesn't work with safetensor files, but with .pth files....
I downloaded them straight from huggingface https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct then I used the files in the original folder in the model repo.
The process for converting a model to a SparseML compatible model doesn't seem all that complicated. Sparsity has a lot of benefits to offer for inference, while you can quantize...
GPT-4 behaves a known way and so does the OpenAI API, with open source models the prompt format can change from model to model, additionally most models aren't even nearly...
> Thanks @Jipok! I neglected to mention that I'm using llama.cpp in server mode. Do you know if there is a way to manually specify the chat format in server...
This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it 
So did a zig build with cuda support (first time ever using zig and wow it's amazing) My code changes seems to work really well, model is responding coherently, anyone...
> > This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it >...