llama-cpp-python Added Mirostat Mode and related Params to Llama initialization

This change is mostly motivated by these parameters being similar to top-k and temperature, which are present in the Llama initialization. This should make utilizing these parameters more user friendly and more consistent with LlamaCpp's internal api.

See issue #312 for some additional context.

Jun 06 '23 05:06 CoffeeVampir3

Added a fix to langchain to make this work as well: https://github.com/hwchase17/langchain/commit/6958837055a55ee4f1b7b34cbb0cb18c6a8493e7

Holding off on the langchain PR until this one is solidified :+1:

Jun 10 '23 03:06 CoffeeVampir3

Hey @CoffeeVampir3 sorry to take so long to look at this. As I see it your PR adds the mirostat parameters as instance parameters, however mirostat sampling is currently possible through both the server and call API, is there a reason this change to be implemented? Are you looking to override the sampling of an existing client you don't have control over?

Jun 10 '23 05:06 abetlen

Hey @CoffeeVampir3 sorry to take so long to look at this. As I see it your PR adds the mirostat parameters as instance parameters, however mirostat sampling is currently possible through both the server and call API, is there a reason this change to be implemented? Are you looking to override the sampling of an existing client you don't have control over?

Not at all, thanks very much for your hard work on the library.

The reason for this was motivated by my work with langchain, which adapts over llama-cpp-python. If I were using llama-cpp, I'd pass in the command line parameters --mirostat_mode 2, --mirostat_tau .9, etc. With this set up in the initializer, you get quite a clean api that is consistent with llama-cpp itself:

llm = LlamaCpp(
    model_path=os.path.abspath(sys.argv[1]),
    lora_path=os.path.abspath(sys.argv[2]) if len(sys.argv) > 2 else None,
    n_batch=512,
    n_ctx = 4_096,
    n_gpu_layers = 60,
    max_tokens = 2048,
    temperature = 1.5,
    top_k = 0,
    top_p = .95,
    verbose = False,
    stop = ("Human:", "AI:"),
    mirostat_mode = 2,
    mirostat_tau = 5.0,
    mirostat_eta = 0.1,
)

These parameters are similar to top-k and temperature, which I feel makes this both an aesthetic choice but also a practical one, as the usage of llama-cpp-python as an adapter becomes quite a bit more difficult to enable this parameter without this change. For example, one method would be to directly override langchains run or generate and inject the parameters into the run call. This gets tricky, partially because of the design of langchain, but also it creates an inconsistency where the mirostat parameters all require special handling and special code. An example hack I wrote to inject the parameters that lets you handle things somewhat consistently with other parameters is this, but this feels very hacky and not really needed:

    def _get_parameters(self, stop: Optional[List[str]] = None) -> Dict[str, Any]:
        #...
        params = self._default_params

        # llama_cpp expects the "stop" key not this, so we remove it:
        params.pop("stop_sequences")

        # then sets it as configured, or default to an empty list:
        params["stop"] = self.stop or stop or []

        params["mirostat_mode"] = 2
        params["mirostat_eta"] = 0.1
        params["mirostat_tau"] = 5.0
        return params

To summarize, this doesn't give any new capabilities but I think makes the overall api easier to use and more consistent in how the parameters are handled.

Jun 10 '23 06:06 CoffeeVampir3