Wouter Tichelaar

Results 21 comments of Wouter Tichelaar

You need to point it at the original folder of Meta-Llama-3-8B-Instruct not the one with safetensors but with the pth files, I already converted mine earlier today. I'll upload it...

It's busy uploading, you can find it here: https://huggingface.co/Azamorn/Meta-Llama-3-8B-Instruct-Distributed

Could you give more information, what repo, what folder you pointing it at etc? I should probably say it again, it doesn't work with safetensor files, but with .pth files....

I downloaded them straight from huggingface https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct then I used the files in the original folder in the model repo.

The process for converting a model to a SparseML compatible model doesn't seem all that complicated. Sparsity has a lot of benefits to offer for inference, while you can quantize...

GPT-4 behaves a known way and so does the OpenAI API, with open source models the prompt format can change from model to model, additionally most models aren't even nearly...

> Thanks @Jipok! I neglected to mention that I'm using llama.cpp in server mode. Do you know if there is a way to manually specify the chat format in server...

This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it ![image](https://github.com/ggerganov/llama.cpp/assets/9594229/aa942092-663b-44a1-9aab-37e203267bfa)

So did a zig build with cuda support (first time ever using zig and wow it's amazing) My code changes seems to work really well, model is responding coherently, anyone...

> > This is my rudimentary attempt at quickly trying to add llama-3 chat template support, if it works alright I'll try and make a pull request for it >...