llm
llm copied to clipboard
Update to latest upstream LLaMA implementation
We're a couple weeks out of date with the current implementation of LLaMA in llama.cpp. There's quite a few changes (including always generating the BOS at the start!) that we should update to handle.
There's also this that's happened:
https://github.com/ggerganov/llama.cpp/pull/1412
Which would help with GPU related issues I believe.