Gustavo Rocha Dias comments

Results 7 comments of


                                            Gustavo Rocha Dias

[User] Increasing ngl parameter decreases performance in Termux

I've made the same observation as you. It's conceivable that the behavior we're noticing is related to how the shared memory interacts between the CPU and GPU. On my personal...

[Feature Request] llama.cpp connection

The easiest way is to use KoboldCpp instead of LlamaCpp. It has a minimal KobolAI API that allows connecting with SillyTavern, and it`s a fork of LlamaCpp being constantly updated....

Colab OOM

Had the same error, seems like that the CPU RAM is not enough to load the model before sending it to the GPU.

[User] nonsense responses with q2_k llama in Termux when using GPU

More on this: recent koboldcpp build, snapdragon 8 Gen 1, termux. Any quant is garbled at GGUF model. k quant or not. Offloaded layers or not. GGML models work okay....

[User] nonsense responses with q2_k llama in Termux when using GPU

> > More on this: recent koboldcpp build, snapdragon 8 Gen 1, termux. > > Any quant is garbled at GGUF model. k quant or not. Offloaded layers or not....

Android OpenCL question

Pointing another thread discussing this topic: #7016

[FEATURE_REQUEST] Aisekai's Knowledge Base

Probably it's a RAG for references. SillyTavern already uses ChromaDB's extension RAG for memories, but this seens to be a different use of it.