llm Making results independent from threadcount/batch size (from llama.cpp)

Making results independent from threadcount/batch size (from llama.cpp)

Open KerfuffleV2 opened this issue 1 year ago • 0 comments

This may be something to keep an eye on: https://github.com/ggerganov/llama.cpp/pull/439

Looks like the corresponding code is here: https://github.com/rustformers/llama-rs/blob/bf7bdbcfff3114dcbdafb6eb7eed58f04f19b1c3/llama-rs/src/lib.rs#L1203

According to the comments in the pull, it should trade a small amount of performance for less memory usage. However, at least one user commented they saw more memory use (not sure what size model).

Mar 24 '23 09:03 KerfuffleV2

llm llm copied to clipboard

Making results independent from threadcount/batch size (from llama.cpp)

llm
llm copied to clipboard