llama.cpp
llama.cpp copied to clipboard
ggml : add SSM Metal kernels
target: #8526
Straightforward Metal implementation of SSM_CONV and SSM_SCAN using single-threaded kernels, mimicking the CPU implementation. Lot's of room for further optimizations, for now assuring correctness
./llama-batched \
-m ./models/mamba-130m/ggml-model-f16.gguf \
-p "Hello, my name is" -np 16 -n 32
main: n_predict = 32, n_ctx = 448, n_batch = 32, n_parallel = 16, n_kv_req = 437
Hello, my name is
main: generating 16 sequences ...
main: stream 0 finished at n_cur = 32
main: stream 1 finished at n_cur = 32
main: stream 2 finished at n_cur = 32
main: stream 3 finished at n_cur = 32
main: stream 4 finished at n_cur = 32
main: stream 5 finished at n_cur = 32
main: stream 6 finished at n_cur = 32
main: stream 7 finished at n_cur = 32
main: stream 8 finished at n_cur = 32
main: stream 9 finished at n_cur = 32
main: stream 10 finished at n_cur = 32
main: stream 11 finished at n_cur = 32
main: stream 12 finished at n_cur = 32
main: stream 13 finished at n_cur = 32
main: stream 14 finished at n_cur = 32
main: stream 15 finished at n_cur = 32
sequence 0:
Hello, my name is Tiffany. I'm a mother of three and a retired teacher. I'm a member of the American Indian and Alaska Native (AI
sequence 1:
Hello, my name is John. I am a freelance writer and editor. I have a passion for writing and have been writing since I was a child. I
sequence 2:
Hello, my name is Renee. I'm a full-time writer, and I'm currently working on a new book. I'm also a graduate
sequence 3:
Hello, my name is Jules. I'm a writer and illustrator. I have a passion for the arts and I love to travel. I love to
sequence 4:
Hello, my name is Renee. I am a single mom of two boys. I am trying to figure out how to make this work. I am
sequence 5:
Hello, my name is Dr. Sonia. I'm a doctor in the University of Medicine and Dentistry of New Jersey. I'm here to help you
sequence 6:
Hello, my name is Nick. I'm a member of the
National Association of Women in the United States of America. I'm
a member
sequence 7:
Hello, my name is Jadine. I'm a real person, and I'm here to help you. I'm here to help you get the best
sequence 8:
Hello, my name is Roxane and I'm a young woman with a love of all things chocolate. I've been a member of the Chocolate Club for
sequence 9:
Hello, my name is John. I'm a professional musician, and I'm looking for a new job. I'm a musician, and I'm looking for
sequence 10:
Hello, my name is Dr. Paul, and I'm a doctor in the area of cardiac surgery. I'm here to help you. I'm here to
sequence 11:
Hello, my name is Daniel and I'm a teacher in an elementary school in the United States. I've been reading about the dangers of the internet for the
sequence 12:
Hello, my name is Sven, and I'm a member of the Sven-Gustavsson Foundation. I'm here to talk about the future
sequence 13:
Hello, my name is Nico, I'm a professional photographer, I work in the studio of the famous photographer, Josef Krammer, who is
sequence 14:
Hello, my name is John. I'm a big fan of your work. I'm looking for a job. I'm looking for a good, honest man
sequence 15:
Hello, my name is John. I'm a newbie to the Internet, and I'm trying to learn how to use it.
I'm trying to
main: decoded 432 tokens in 0.71 s, speed: 609.55 t/s
llama_print_timings: load time = 137.83 ms
llama_print_timings: sample time = 10.18 ms / 448 runs ( 0.02 ms per token, 44025.16 tokens per second)
llama_print_timings: prompt eval time = 727.16 ms / 437 tokens ( 1.66 ms per token, 600.97 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 845.80 ms / 438 tokens
ggml_metal_free: deallocating
./llama-perplexity \
-m ./models/mamba-130m/ggml-model-f16.gguf \
-f build/wikitext-2-raw/wiki.test.raw -ngl 99
perplexity: tokenizing the input ..
perplexity: tokenization took 950.02 ms
perplexity: calculating perplexity over 650 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 0.55 seconds per pass - ETA 1.48 minutes
...
Final estimate: PPL = 25.0894 +/- 0.18559