Erik Scholz

Results 282 comments of Erik Scholz

@slaren earlier i wrote: > We should instead "determine the required inference memory per token" https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#L891 , so it can increase the size by itself dynamically https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#L557 > edit: I...

please try https://github.com/ggerganov/llama.cpp/pull/438 and see if it fixes it. i implemented the observations made in this thread.

@Bec-k can you elaborate on what you think is not implemented?

closing this in favor of https://github.com/ggerganov/ggml/issues/21 also https://github.com/saharNooby/rwkv.cpp seems to be it.

@DerekFroese not sure if you are still looking for that, but a shell application with similar features does exist https://en.wikipedia.org/wiki/MTR_(software)

check sum for the converted (ggmf v1) Pi3141 alpaca-30B-ggml ``` $ sha256sum ggml-model-q4_0.bin 969652d32ce186ca3c93217ece8311ebe81f15939aa66a6fe162a08dd893faf8 ggml-model-q4_0.bin ```

@anzz1 you did not specify for which model your links are. also please provide checksums :)

me: i should try and debug all those crashes me: `> help me write a song about llama.cpp (c++ api for facebooks llm)` llama.cpp: ``` A llama is an animal...

the ones you linked are sadly mixed, and not "pure" lora models. so i would assume no. you could just say "pi3141 alpaca 30B" model, and it would be fine...

"mixed" -> "merged" If you look at this for example https://huggingface.co/tloen/alpaca-lora-7b/tree/main , those are **only** the lora weights. I **think** (need to actually read the paper) those are either not...