Patrick Buckley comments

Results 13 comments of


                                            Patrick Buckley

Apple silicon support?

For what it’s worth PyTorch has support for an mps backend that one can query for and set that will drastically improve performance on apple silicon. For most things it’s...

https://github.com/ggerganov/llama.cpp/issues/71#issuecomment-1466943009 feel free to take a look at this diff, theres probably two other lines you want to change. Probably ignore the EPS stuff though that should be changed too...

Longer and infinite output

Depending on how much memory you have you can increase the context size to get longer outputs. On a 64gb machine I was able to have a 12k context with...

Longer and infinite output

Typically if you get the not enough space in the context error you tried setting the context too large, though on the larger models I have had to tweak this...

Longer and infinite output

https://github.com/eous/llama.cpp/commit/e0213e08c03a3ac72cdec4596b872073b51655aa here is some easy stuff I pulled out of my local hackery if anyone wants to play with it

Longer and infinite output

Btw if anyone wants to slice/dice/refactor/cleanup/dissect/mixup/etc that changeset feel free, I don't need to be credited.

Any way to change context limit?

https://github.com/ggerganov/llama.cpp/issues/71

Attempting to merge with alpaca-lora and its quantization

> Alpaca-lora author here. I've added a script to merge and convert weights to state_dict in my repo [(link)](https://github.com/tloen/alpaca-lora/blob/main/export_state_dict_checkpoint.py). Curious to see it run on llama.cpp :) ``` Instruction: What...

Attempting to merge with alpaca-lora and its quantization

> I just tried alpaca-lora merged model with quantization. The result was not that good as examples introduced in tloen repo. It might be price of quantization or merge was...

[WIP, broken] Importer for GPTQ quantized LLaMA models

Dunno if its the same thing but when dealing with hugging face llama models we had to unpermute the wq/wk attention layers w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1,...