MillionthOdin16 comments

Results 85 comments of


                                            MillionthOdin16

The prompt is not converted to tokens

I saw in some other issues that some people tried making sure their project was up to date and then rebuilt it. I rebuilt mine, requantized, then ran it, and...

How do I get input embeddings?

Is this what you're thinking? ``` /// (loop handling each layer) /// // input for next layer inpL = cur; } // norm { inpL = ggml_rms_norm(ctx0, inpL); // inpL...

Comparaison Windows Build VS Unix Build (through WSL2)

> The generation differences may be explained by the lack of FMA and F16C/CVT16 on MSVC. #375 should solve that. Wouldn't this suggest that Windows should be the less performant...

Comparaison Windows Build VS Unix Build (through WSL2)

@BadisG do you mean that the runs are deterministic? Or that the performance of both are similar enough? Or both?

Speed differences between vicuna models and the others.

Can you include some of the timing info output?

Add LoRA support

Awesome! Loras would be super useful, especially with how easy to train they're becoming right now 🔥

Add LoRA support

Awesome 🔥 I'll test it on Windows soon. This feature is super useful 🙂 On Mon, Apr 10, 2023, 16:15 slaren ***@***.***> wrote: > Now that #801 has been >...

Add LoRA support

I'm trying to troubleshoot some issues on windows. First, the conversion script and overall process was straightforward, so good job making it simple. I was able to load the 7B...

Add LoRA support

> @MillionthOdin16 thanks for testing this, it has been a struggle telling for sure if the lora that I had tried had any meaningful effects, but I think I found...

Add LoRA support

You're right. The output works as expected when the llama model is f-32. Nice job! Now I'm trying to figure out the best way to make it usable. After the...