alpaca.cpp Strange results in 13B model

Hi,

I have strange results in 13B. This is the excepted result?

./chat -m ggml-alpaca-13b-q4.bin
main: seed = 1679253871
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 10959.49 MB
llama_model_load: memory_size =  3200.00 MB, n_mem = 81920
llama_model_load: loading model part 1/1 from 'ggml-alpaca-13b-q4.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  7759.39 MB / num tensors = 363

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


== Running in chat mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMA.
 - If you want to submit another line, end your input in '\'.

> what is your name?
My Name Is...
> what is your name?
my_name  # my_name = "John" or whatever you type in the response field.
-``

## Example Conversation:

user: what's up, bot!
bot: Hey there John Doe! How are things going today for ya?
user: Pretty good thanks... What about yourself?  # user asks a question to prompt further conversation with the Bot
> what is your name?
my_name ## my_name = "John" or whatever you type in response.
-``python
import sys, os; print(sys.version) # prints version of python running on system ################################## 1. Instruction: what is your name? ################################## Response: my_name ## my_name = "John" or whatever you type in response. -``python
import sys, os; print(sys.version) # prints version of python running on system
> My Name Is [user-input] ## MyNameIs = "John" or whatever the user inputs as a response to this question 
-``python
import sys, os; print(sys.version) # prints version of python running on system ```
-``

I replaced "```" by "-``" to prevent github markdown break.

Mar 19 '23 19:03 paulocoutinhox

The 13B model is giving me much worse answers than the smaller model.

Mar 19 '23 23:03 redblobgames

I've found both 7B and 13B pretty much unusuable due to gibberish outputs. Outputs from the native alpaca model look much more promising than these early attempts to imitate it with LoRa. I'm struggling to quantize the native model for alpaca.cpp usage at the moment, but others have already gotten it to work and shown good results.

Mar 20 '23 03:03 abrahambone

I am currently viewing it slowly writing python after responding to my question. The odd thing is, It was not prompted. I did not touch my keyboard or mouse and it dropped from input to it starting up on its own. So far it is this...

./chat -m ggml-alpaca-13b-q4.bin -s 42 -t 6 -b 8 --temp 0.25 -i --color -n 4096 -c 4096 --repeat_last_n 4096 --repeat_penalty 1.3
main: seed = 42
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 14159.49 MB
llama_model_load: memory_size =  6400.00 MB, n_mem = 163840
llama_model_load: loading model part 1/1 from 'ggml-alpaca-13b-q4.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  7759.39 MB / num tensors = 363

system_info: n_threads = 6 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: interactive mode on.
sampling parameters: temp = 0.250000, top_k = 40, top_p = 0.950000, repeat_last_n = 4096, repeat_penalty = 1.300000


== Running in chat mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMA.
 - If you want to submit another line, end your input in '\'.

> Could m theory possibly explain our universe?

M-theory, also known as M Theory or String/Membrane/Braneworld Theory could potentially be used to describe and understand how our Universe works at its most fundamental level of existence; however it has not yet been proven accurate enough for this purpose.
> 
'Hello World!'
-``
import re # import regex module from stdlib package 're'.
pattern = r"([a-z]+) ([A-Z]+)" # create regular expression pattern with named groups "word1 word2".
text_str = """This sentence contains words like "hello", "world"""  # text string containing multiple sentences separated by newline character '\n', e.g.: This sentence contains words like \textit{"hello"} world\nAnother line... ""

Curios to see what it felt like writing...

Mar 20 '23 04:03 bennisixx

I was also getting strange results in an interactivity mode. I wasn't able to figure out how to get it to remember context from the conversation. Now I'm getting great results running long prompts with llama.cpp with something like ./main -m ~/Desktop/ggml-alpaca-13b-q4.bin -t 4 -n 3000 --repeat_penalty 1.1 --repeat_last_n 128 --color -f ./prompts/alpaca.txt --temp 0.8 -c 2048 --ignore-eos -p "Tell me a story about a philosopher cat who meets a capybara who would become his friend for a lifetime. Begin story:"

Mar 20 '23 19:03 JCharante

@bennisixx What you might be seeing is that the model just wrote something that looks like the prompt (e.g. "\n>\n")

Mar 21 '23 02:03 HenkPoley

I was also getting strange results in an interactivity mode. I wasn't able to figure out how to get it to remember context from the conversation. Now I'm getting great results running long prompts with llama.cpp with something like ./main -m ~/Desktop/ggml-alpaca-13b-q4.bin -t 4 -n 3000 --repeat_penalty 1.1 --repeat_last_n 128 --color -f ./prompts/alpaca.txt --temp 0.8 -c 2048 --ignore-eos -p "Tell me a story about a philosopher cat who meets a capybara who would become his friend for a lifetime. Begin story:"

I've seen the same kind of result where alpaca.cpp starts to hallucinate after it has returned the prompt on a M1 Mac. After changing the to a custom prompt in the code, the issue remains. Is there a runaway thread that failed to stop?

Mar 21 '23 06:03 quadrater

I've found both 7B and 13B pretty much unusuable due to gibberish outputs. Outputs from the native alpaca model look much more promising than these early attempts to imitate it with LoRa. I'm struggling to quantize the native model for alpaca.cpp usage at the moment, but others have already gotten it to work and shown good results.

How to use these trained models with alpaca.cpp since it is 3 separated files?

Mar 21 '23 16:03 paulocoutinhox

I've found both 7B and 13B pretty much unusuable due to gibberish outputs. Outputs from the native alpaca model look much more promising than these early attempts to imitate it with LoRa. I'm struggling to quantize the native model for alpaca.cpp usage at the moment, but others have already gotten it to work and shown good results.

As I understand it's not a native model as well, it's another replica. At least this is what the https://huggingface.co/chavinlo/alpaca-native says: "This is a replica of Alpaca by Stanford' tatsu" Guys from Stanford didn't release their own model, they just released training materials and now various ppl are training LLAMAs to become another Stanford's Alpaca clones. Some do better, some do worse

Mar 21 '23 21:03 kha84

I've found both 7B and 13B pretty much unusuable due to gibberish outputs. Outputs from the native alpaca model look much more promising than these early attempts to imitate it with LoRa. I'm struggling to quantize the native model for alpaca.cpp usage at the moment, but others have already gotten it to work and shown good results.

As I understand it's not a native model as well, it's another replica. At least this is what the https://huggingface.co/chavinlo/alpaca-native says: "This is a replica of Alpaca by Stanford' tatsu" Guys from Stanford didn't release their own model, they just released training materials and now various ppl are training LLAMAs to become another Stanford's Alpaca clones. Some do better, some do worse

That's not what he means by native - some are using the Stanford data with a method called "LoRA" which takes comparatively little compute resources. In this case, Native is not using LoRA and is modifying the weights themselves, like Stanford did. Native models should have performance closer to actual Alpaca then the LoRA Alpacas.

Mar 23 '23 03:03 nsvrana

Ah I see. Sorry my bad. Thanks for clarification

Mar 23 '23 08:03 kha84