Patrick Buckley
Patrick Buckley
For what it’s worth PyTorch has support for an mps backend that one can query for and set that will drastically improve performance on apple silicon. For most things it’s...
https://github.com/ggerganov/llama.cpp/issues/71#issuecomment-1466943009 feel free to take a look at this diff, theres probably two other lines you want to change. Probably ignore the EPS stuff though that should be changed too...
Depending on how much memory you have you can increase the context size to get longer outputs. On a 64gb machine I was able to have a 12k context with...
Typically if you get the not enough space in the context error you tried setting the context too large, though on the larger models I have had to tweak this...
https://github.com/eous/llama.cpp/commit/e0213e08c03a3ac72cdec4596b872073b51655aa here is some easy stuff I pulled out of my local hackery if anyone wants to play with it
Btw if anyone wants to slice/dice/refactor/cleanup/dissect/mixup/etc that changeset feel free, I don't need to be credited.
https://github.com/ggerganov/llama.cpp/issues/71
> Alpaca-lora author here. I've added a script to merge and convert weights to state_dict in my repo [(link)](https://github.com/tloen/alpaca-lora/blob/main/export_state_dict_checkpoint.py). Curious to see it run on llama.cpp :) ``` Instruction: What...
> I just tried alpaca-lora merged model with quantization. The result was not that good as examples introduced in tloen repo. It might be price of quantization or merge was...
Dunno if its the same thing but when dealing with hugging face llama models we had to unpermute the wq/wk attention layers w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1,...