llama.cpp
llama.cpp copied to clipboard
Unlimited context length now possible
To accomplish unlimited context length for LLM we need to follow this NBCE guide below. I really hope this will be implemented into llama.cpp soon:
https://github.com/bojone/NBCE/tree/main
Having unlimited prompt token size is the dream of every AI user. Please implement this into Llama.cpp
While NBCE may be not the best approach, we can also take a look at this:
- https://github.com/epfml/landmark-attention
- https://github.com/princeton-nlp/AutoCompressors
- https://arxiv.org/pdf/2305.19370.pdf
Nice article:
- https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c
Looks better than unlimitformers!
And that let the real time training possible? The model can be dynamic no more static!
Also that means, other classic algorithms may work fine with llm too.
If I understand correctly, implementing infinite context length with NBCE would require infinite computing power, but perhaps I misunderstood?
If I understand correctly, implementing infinite context length with NBCE would require infinite computing power, but perhaps I misunderstood?
Large ctx leads large memory use, and I think it's a good thing with llama.cpp, ram is cheap than vram, besides, that NBCE might be able to store like the cache does now... at that point, we finally can use the same hardware but gain multi times length...
It's not possible. The hazyresearch group from stanford has the best approach to solve this problem. But their solution is targeting GPUs rather than what Llama.cpp does.
This issue was closed because it has been inactive for 14 days since being marked as stale.