Results 99 comments of Stephan Walter

I will have a look at dynamically selecting 7 or 8 (or even a value inbetween) according to RMSE, but should we just ignore the maximum error? Essentially this is...

> Also, is there an extension of this approach to `Q4_1` and `Q4_3` ? These already use the full range, `min` maps to 0 and `max` maps to 15.

I think it's great that you address power consumption. We have been looking at tokens per second, but tokens per Watt is also important, especially on battery-powered devices. Though I...

I may have misunderstood this. I have 4 cores and don't usually give a `-t` flag, so 4 threads. Here's what I'm seeing with your PR (per token, generously rounded...

> In your case, (none) is effectively "-e 4 -t 4" and is intended to be equivalent to "-t 4". Perfectly fine. > For the "-e1 -t4" case, you're specifying...

Llama.cpp specifically targets the CPU, so it's unlikely such a dependency will be added, but see the discussion in #915.

Presumably solved by #927, closing.

> I am fairly certain that there is a straightforward way to compute the optimum value without search. I'd love to see that, but while the error function seems to...

Now that the statistics tool has landed in master, I've rebased my branch and updated the tool to accept an `--implementation` argument instead of `--reference`. @unbounded : I will definitively...

@ivanstepanovftw Thanks for your effort. The first few values match mine exactly, so I'll trust your results. It's good to see at least a small improvement. But as I said...