mikeggh

Results 18 comments of mikeggh
trafficstars

Ah yeah this is great.. came across it, then realized it hasn't been merged yet..

Are you using the specific binary for stablelm? It seems separated from the looks of it in https://github.com/ggerganov/ggml/tree/master/examples/stablelm

I've been considering something similar but with other non LLM networks using the same technique, so thank you for posting. I'm happy to have read the other comments regarding papers...

> Bottom line is that I don't think it's helpful for CPU eval unless you do what @Piezoid said and say "okay that generation is 'good enough' for some definition...

Well... Later today.. I plan on saving the state of general starting context/prompts and loading that to speed up using more than what I have on my current test platform,...

In the code there is this piece: in examples/main/main.cpp: https://github.com/ggerganov/llama.cpp/blob/0b2da20538d01926b77ea237dd1c930c4d20b686/examples/main/main.cpp#L157 // Add a space in front of the first character to match OG llama tokenizer behavior params.prompt.insert(0, 1, ' ');...

> @dmahurin in terms of program logic it probably wouldn't take much; starting from the end of the session is actually simpler because you're not finding a common prefix. I...

I'm sure it would require something like RDMA to be efficient, and even then it may not be worth it. GPU distribution is nice because GPU matrices are so much...

I believe when its split across GPUs its acceptable because of the multiplier of GPU vs CPU matrice instructions.. the gain would dwindle significantly. I don't know of any VPS...