llama.cpp Sample interface, new samplers,

ignore EOS should apply -inf to EOS logit.

New samplers:

locally typical sampling
tail free sampling
frequency and presence penalty
mirostat

Apr 22 '23 12:04 ivanstepanovftw

Nice work!

I'll link the literature here. Feel free to complete with more up do date sources.

CTRL paper for the repetition penalty currently used in llama.cpp
Frequency and Presence penalties
locally typical sampling
tail free sampling
mirostat

I like the idea of a modular interface for sampling. It enables each code sample and application to combine these parts to do its own kitchen-sink sampling that fits their needs. Going further with this, the llama.h interface could be stripped to only provide access to logits and vocabulary, and the sampling code moved to a separate object file. This would emphasize and guarantee the extensibility of the samplers.

I am hesitant about the current implementation of repetition penalization. As an illustration, I question whether the occurrence of past newlines and punctuation should guide the sampling of the following tokens. Attempting to fix this, the repetitions could be weighed against a simple frequency model. However, I wasn't able to recover such frequencies from the tokenizer weights. It's possible to gather more information by measuring the length of the repetition that the next token would complete or interrupt. I have implemented this idea and an exponential decay.

Concerning the application of the penalization, I'm not sure whether it is better to offset the logits or to scale them. Subtracting to the logit, used by "frequency and presence penalty", amounts to scaling the probabilities. Scaling the logits, which is discussed in the CTRL paper, can be thought of as a way of raising probabilities to a power, but is dependent on the logit=0 point which is not particularly meaningful. Your current implementation applies both methods successively, which seems redundant.

I haven't found the time to read in details about mirostat. My limited knowledge tells me that as the number of parameters goes up, the method becomes more challenging to apply in practice. Additionally, it seems difficult to control the changing target surprise mu using feedback, especially when working with an auto-regressive model. On the other hand, the promise of avoiding repetitions and boredom traps without looking at past tokens is very interesting.

I found that it is quite difficult to evaluate the sampling algorithms. We have good starting points with your analysis, the information-theoretic formalism of the locally typical sampling and mirostat papers, and their evaluation methods. Doing such experiments takes time end effort. Also, large scale human evaluations are next to impossible without a large community effort.

Apr 22 '23 15:04 Piezoid

The CTRL paper does not mention, but in fact, the CTRL repository explicitly avoids penalizing newline tokens during sampling.

Apr 22 '23 19:04 ivanstepanovftw

Rebased, added 2 commits since last review

Apr 28 '23 17:04 ivanstepanovftw

Mark "ready for review" when you think it is good for merge

Apr 28 '23 18:04 ggerganov

I do not have Windows machine with MSVC installed, I am not sure why it fails:

3: Test command: D:\a\llama.cpp\llama.cpp\build\bin\Release\test-sampling.exe
3: Working Directory: D:/a/llama.cpp/llama.cpp/build/tests
3: Test timeout computed to be: 1500
3/4 Test #3: test-sampling ....................***Exception: Numerical  0.01 sec

Apr 28 '23 18:04 ivanstepanovftw

Ready for review

Apr 28 '23 19:04 ivanstepanovftw

very cool. I always wanted a way to blacklist tokens, like backslash.

Oh, I got it, for \code{begin}!

Apr 29 '23 20:04 ivanstepanovftw

Oh, I got it, for \code{begin}!

yea :smile: and \code{end} , the model often emits this before eos or tries do dodge/end the conversation. Already tested it, works great.

edit: its -l 29905-100000 , if anyone is interested.

Apr 29 '23 21:04 Green-Sky

You could write -l 29905-inf 😊 I have used stof instead of stringstream just to make "inf" work

Apr 30 '23 00:04 ivanstepanovftw

Any thoughts on the removal of parameter defaults of new sampling function to keep llama.h compatible with C/Obj-C?

Apr 30 '23 07:04 byroneverson

edit: its -l 29905-100000 , if anyone is interested.

Could anyone please share how to get the token id, and could I pass multiple tokens at once with the --logit-bias flag?

May 16 '23 01:05 DenisSergeevitch

@DenisSergeevitch you can supply --verbose-prompt

--verbose-prompt      print prompt before generation

eg:

$ bin/main --verbose-prompt -m ../models/open_llama_7b_preview_300bt/ggml-model-q4_0.bin -p "Test prompt"
 
...
 
main: prompt: ' Test prompt'
main: number of tokens in prompt = 3
     1 -> ''
  5073 -> ' Test'
  7593 -> ' prompt'
...

May 16 '23 12:05 Green-Sky

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

May 18 '23 19:05 ivanstepanovftw

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

Thanks, I have done a small uncesoring method based on this flag, works like a charm!

May 19 '23 01:05 DenisSergeevitch

@ivanstepanovftw I'm working on a Rust-based implementation of these samplers and using the code you wrote as a reference. I'm crediting the llama.cpp project but I can mention by name in the project README as well since you wrote it (and I don't think it's really been changed much since the initial commit). I didn't want to do something like that without asking first, though.

Also, if you're unhappy with the way I'm handling this (the credits or otherwise) please let me know and hopefully we can work something out!

Link: https://github.com/KerfuffleV2/llm-samplers/

Jun 20 '23 12:06 KerfuffleV2

@KerfuffleV2 Sure you can! Glad that you support RWKV, looks very promising.

Oct 09 '23 20:10 ivanstepanovftw

llama.cpp llama.cpp copied to clipboard

Sample interface, new samplers,

llama.cpp
llama.cpp copied to clipboard