Thomas Antony

Results 19 comments of Thomas Antony

You can use the updated "llamacpp-convert" script with the original Llama weights (pytorch format) to generate the new ggml weights. Another option is to use the "migrate" script from https://github.com/ggerganov/llama.cpp...

Well, feel free to merge it! . I am glad that I was able to contribute. :).

@oobabooga That is a side effect of how the underlying Python bindings works right now. Adding support for changing those parameters when sampling from the logits is on my ToDo...

@niizam 4-bit quantized models are already supported. You just need to use the appropriate weight files.

I am on an Apple Silicon mac. I had to install the following packages separately before it would work: ``` conda activate web-ui pip install scikit-image pip install jsonmerge pip...

@ggerganov I have made the changes. Please let me know what you think

@j-f1 @Green-Sky @ggerganov I have done another pass at refactoring and also fixed a few logical bugs that left interactive mode broken in my original version (among other things). I...

> @thomasantony We want to have a C-style API in `llama.h`. We cannot expose C++ constructs > > For now, leave it like this and let me apply the necessary...

You can already do that by passing in `-DLLAMA_STATIC=Off -DBUILD_SHARED_LIBS=On` to cmake.