MillionthOdin16
MillionthOdin16
I saw in some other issues that some people tried making sure their project was up to date and then rebuilt it. I rebuilt mine, requantized, then ran it, and...
Is this what you're thinking? ``` /// (loop handling each layer) /// // input for next layer inpL = cur; } // norm { inpL = ggml_rms_norm(ctx0, inpL); // inpL...
> The generation differences may be explained by the lack of FMA and F16C/CVT16 on MSVC. #375 should solve that. Wouldn't this suggest that Windows should be the less performant...
@BadisG do you mean that the runs are deterministic? Or that the performance of both are similar enough? Or both?
Can you include some of the timing info output?
Awesome! Loras would be super useful, especially with how easy to train they're becoming right now 🔥
Awesome 🔥 I'll test it on Windows soon. This feature is super useful 🙂 On Mon, Apr 10, 2023, 16:15 slaren ***@***.***> wrote: > Now that #801 has been >...
I'm trying to troubleshoot some issues on windows. First, the conversion script and overall process was straightforward, so good job making it simple. I was able to load the 7B...
> @MillionthOdin16 thanks for testing this, it has been a struggle telling for sure if the lora that I had tried had any meaningful effects, but I think I found...
You're right. The output works as expected when the llama model is f-32. Nice job! Now I'm trying to figure out the best way to make it usable. After the...