Stephan Walter comments

Results 99 comments of


                                            Stephan Walter

Use full range for q4_0 quantization

I will have a look at dynamically selecting 7 or 8 (or even a value inbetween) according to RMSE, but should we just ignore the maximum error? Essentially this is...

Use full range for q4_0 quantization

> Also, is there an extension of this approach to `Q4_1` and `Q4_3` ? These already use the full range, `min` maps to 0 and `max` maps to 15.

Add "-e"/"--eval-threads" to distinguish thread counts for single-token eval and prompt eval

I think it's great that you address power consumption. We have been looking at tokens per second, but tokens per Watt is also important, especially on battery-powered devices. Though I...

Add "-e"/"--eval-threads" to distinguish thread counts for single-token eval and prompt eval

I may have misunderstood this. I have 4 cores and don't usually give a `-t` flag, so 4 threads. Here's what I'm seeing with your PR (per token, generously rounded...

Add "-e"/"--eval-threads" to distinguish thread counts for single-token eval and prompt eval

> In your case, (none) is effectively "-e 4 -t 4" and is intended to be equivalent to "-t 4". Perfectly fine. > For the "-e1 -t4" case, you're specifying...

[Feature Request] Dawn C++ WebGPU backend

Llama.cpp specifically targets the CPU, so it's unlikely such a dependency will be added, but see the discussion in #915.

Fails to run quantize command

Presumably solved by #927, closing.

Q4_0 scale selection using RMSE

> I am fairly certain that there is a straightforward way to compute the optimum value without search. I'd love to see that, but while the error function seems to...

Q4_0 scale selection using RMSE

Now that the statistics tool has landed in master, I've rebased my branch and updated the tool to accept an `--implementation` argument instead of `--reference`. @unbounded : I will definitively...

Q4_0 scale selection using RMSE

@ivanstepanovftw Thanks for your effort. The first few values match mine exactly, so I'll trust your results. It's good to see at least a small improvement. But as I said...