Philipp Emanuel Weidmann
Philipp Emanuel Weidmann
@Hunterius8 Could you quantify that? What is your tokens/s with and without DRY? On my dev machine, I'm seeing 4.99 tokens/s with DRY and 4.98 tokens/s without it. I'm running...
@Hunterius8 I see, that's a lot more context than I've ever run, combined with a pretty high base performance, so this is probably the reason I don't notice it in...
> Make it a LogitsProcessor like other repetition penalties That means losing control over DRY's position in the sampler stack, right? I think it can be valuable to be able...
@Priestru > Also is it possible to add smth like a vocabulary of phrases and words that we want to have penalized right off the bat? I plan to implement...
@l3utterfly is porting DRY to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/6839
@oobabooga Could you give me a hint on how to proceed here? Do you plan to merge this PR? If so, what are the remaining steps?
@ggerganov To me, the repetition penalty is the single most important sampling parameter. Every model I've ever used repeats itself without it. Just recently, I accidentally ran Mixtral-8x7b (currently the...
@oobabooga > My first impression is that the parameter is very sensitive. Probably 3 decimal places are needed to find something optimal for a given situation. Optimal, maybe. But beneficial,...
@oobabooga I like your idea, and I can see how in many cases, it would improve the range of values that make sense. The reason I don't think it's a...
@jukofyork > How do you think this method would work with coding models? If code in the language that is to be generated is already present in the context, the...