llama.cpp Unlimited context length now possible

To accomplish unlimited context length for LLM we need to follow this NBCE guide below. I really hope this will be implemented into llama.cpp soon:

https://github.com/bojone/NBCE/tree/main

Having unlimited prompt token size is the dream of every AI user. Please implement this into Llama.cpp

Jun 03 '23 23:06 ekolawole

While NBCE may be not the best approach, we can also take a look at this:

https://github.com/epfml/landmark-attention
https://github.com/princeton-nlp/AutoCompressors
https://arxiv.org/pdf/2305.19370.pdf

Nice article:

https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c

Jun 04 '23 03:06 generalsvr

Looks better than unlimitformers!

And that let the real time training possible? The model can be dynamic no more static!

Also that means, other classic algorithms may work fine with llm too.

Jun 04 '23 12:06 FNsi

If I understand correctly, implementing infinite context length with NBCE would require infinite computing power, but perhaps I misunderstood?

Jun 04 '23 15:06 1980Dragon

If I understand correctly, implementing infinite context length with NBCE would require infinite computing power, but perhaps I misunderstood?

Large ctx leads large memory use, and I think it's a good thing with llama.cpp, ram is cheap than vram, besides, that NBCE might be able to store like the cache does now... at that point, we finally can use the same hardware but gain multi times length...

Jun 04 '23 16:06 FNsi

It's not possible. The hazyresearch group from stanford has the best approach to solve this problem. But their solution is targeting GPUs rather than what Llama.cpp does.

Jun 04 '23 17:06 okpatil4u

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 10 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Unlimited context length now possible

llama.cpp
llama.cpp copied to clipboard