Johannes Gäßler

Results 235 comments of Johannes Gäßler

I've pushed a rebased version. The new quantization formats seem to be working correctly in combination with multi GPU. The CI will take some time anyways so I will quickly...

As far as I can tell everything is working correctly. Performance is good as long as I don't forget to disable debug options.

I get `Error: connect ECONNREFUSED 127.0.1.1:8080` when I try to run the server example but I get the same error on master so I'm assuming that it has nothing to...

Thank you for being patient with me.

I didn't investigate what the minimum compute capability is. The multi GPU code does work on 4x GTX Titan X though which have a compute capability of 5.2.

I don't see why the generation would be done on the CPU. I think the problem is rather that one of the operations that I used has good performance on...

https://github.com/ggerganov/llama.cpp/tree/master/examples/main#additional-options

Sorry, it seems I made a mistake at some point and didn't catch it during review. This is not intended.

I was thinking recently that better threading would be nice to have. Anyways, I didn't yet look at the PR in detail but I can already give you feedback regarding...

I've written about it here: https://github.com/ggerganov/llama.cpp/discussions/4534#discussioncomment-7900305