Kerfuffle

Results 159 comments of Kerfuffle

Glad to help! I tried again. I don't know if it's important, but I just compile using `make`, so `make clean && make LLAMA_CUBLAS=1` First, for reference `perplexity` running on...

I got impatient and tried running with batch size 4: ```plaintext Thread 1 "perplexity" received signal SIGSEGV, Segmentation fault. 0x00005555555632d8 in ggml_vec_mul_f32 (n=4096, z=0x7ffe48010000, x=0x7ffe48000000, y=0x7fffa6820400) at ggml.c:2073 2073 inline...

I said "latest commit" but I lied since you added that while I was composing the message. However https://github.com/ggerganov/llama.cpp/pull/1632/commits/d5d900d75c09f894e3ba0960950ef8b9df7f4aa4 doesn't seem to have made a difference in the behavior: still...

> I assume you had run `mulmat-tune` bench Sorry, no... I didn't know it was necessary. Is crashing/not working properly the expect behavior in that case? I apologize if I...

> Looks like the assertion error is cased by clearing tensor.backend in previous commit, I reverted. Unfortunately it doesn't seem to have fixed the issue: ```plaintext ./mulmat-tune bench [bench] model:...

Quick update: I checked out the latest version and gave it another try. `mulmat-tune bench` now runs, however it doesn't seem to use cuBLAS. [bench] model: 7B, type: Q4_0 I...

I'm sorry, actually it did work. I just stopped it too early before I guess (since previously it actually said when it was using a CUDA backend). Seems like it...

> Would you please try less n_threads: 1, 2, 4? Unfortunately, with the latest changes we're back to running into an assertion failure: ```plaintext ggml_init_cublas: found 1 CUDA devices: Device...

GGML_ASSERT: ggml.c:10034: comp_backend & GGML_TASK_BACKEND_CPU Looks like `ggml_compute_forward_mul_mat_q_f32` may need a similar change?

I'm going to try to look at how to add this to `llm-samplers`. It will need the CFG logits though, so `llm` will need to handle that itself. I guess...