mikeggh comments

Results 18 comments of


                                            mikeggh

trafficstars

Adds an option for in-order downloading in BitTorrent.

Ah yeah this is great.. came across it, then realized it hasn't been merged yet..

Support StableLM From StabilityAI

Are you using the specific binary for stablelm? It seems separated from the looks of it in https://github.com/ggerganov/ggml/tree/master/examples/stablelm

Combine large LLM with small LLM for faster inference

I've been considering something similar but with other non LLM networks using the same technique, so thank you for posting. I'm happy to have read the other comments regarding papers...

Combine large LLM with small LLM for faster inference

> Bottom line is that I don't think it's helpful for CPU eval unless you do what @Piezoid said and say "okay that generation is 'good enough' for some definition...

Combine large LLM with small LLM for faster inference

Well... Later today.. I plan on saving the state of general starting context/prompts and loading that to speed up using more than what I have on my current test platform,...

Combine large LLM with small LLM for faster inference

@ejones Great job :> thank you sir!

The same context, supplied differently, lead to different outcome

In the code there is this piece: in examples/main/main.cpp: https://github.com/ggerganov/llama.cpp/blob/0b2da20538d01926b77ea237dd1c930c4d20b686/examples/main/main.cpp#L157 // Add a space in front of the first character to match OG llama tokenizer behavior params.prompt.insert(0, 1, ' ');...

mikeggh

Adds an option for in-order downloading in BitTorrent.

Support StableLM From StabilityAI

Combine large LLM with small LLM for faster inference

Combine large LLM with small LLM for faster inference

Combine large LLM with small LLM for faster inference

Combine large LLM with small LLM for faster inference

The same context, supplied differently, lead to different outcome

Save and restore prompt evaluation state for much faster startup times

distributed llama without gpu, using only cpu

distributed llama without gpu, using only cpu