Johannes Gäßler

Results 235 comments of Johannes Gäßler

It's already close to 2 AM where I live but I think the 160K tokens you use as input may simply not be enough. I'll do some related testing tomorrow.

Using Wikitext-103 train as input and the models that I already have available I am so far not able to provoke `imatrix` into producing NaN values. `imatrix` calculates sums of...

>so how would a sum of something + nans turn out into something not a nan, something you claimed would happen? Nobody has seen this happen, and numerically it cannot...

Regarding ggml graph creation overhead: I think the impact of this will heavily depend on the baseline t/s you can get with a given model. Presumably you're investigating the impact...

llama.cpp currently only ever serves one user at a time so this optimization is not applicable.

Yes, for enterprise use where you have one server generating responses for many users in parallel the optimization would be useful.

I don't have any plans for it because I don't care about commercial use but I can't speak for the other devs.

I'm not really concerned with what other people want to use llama.cpp for. I'm implementing things that are useful for me personally first and foremost. And I don't see how...

Sorry, I forgot: you need to install rocWMMA https://github.com/ROCm/rocWMMA .

Well, this looks like it would be non-trivial to fix. I was hoping it would be possible to just use rocWMMA as a drop-in replacement. But as I said, I...