Johannes Gäßler comments

Results 235 comments of


                                            Johannes Gäßler

Server: enable lookup decoding

Thank you for the high-quality post. I definitely agree that the hashing is suboptimal, my main concern for now is to get something that works at all, and to also...

Server: enable lookup decoding

Prior to reading the hashing function blog post I wrote a simple implementation that just uses bit shifts and xors but that already results in much better performance: | Model...

Server: enable lookup decoding

I think the model and prompt will be a bigger factor than the hardware as long as the hashing is fast enough. These are some numbers I get on my...

Server: enable lookup decoding

I've added a test for asserting that lookup decoding produces correct results. The sequences are the same for temperature 0 though the results are not going to be bit-for-bit identical....

Server: enable lookup decoding

I'm not sure what you mean by overload but I'm happy to test suggested alternatives.

Server: enable lookup decoding

I took over the Fibonacci hash implementation. For LLaMA 3 q4_K_M on an RTX 4090 it's maybe a ~1% end-to-end speedup. Results | Model | GPU | Static lookup cache...

Server: enable lookup decoding

I re-tested the performance on 1x RTX 4090 with CUDA graphs but against my expectations I am seeing virtually no performance difference compared to before: | Model | GPU |...

Server: enable lookup decoding

The numbers for the `server-ngram` branches on my repository are just the numbers I use internally to keep my branches apart. Just use the branch I'm using for this PR.

Performance decreated between tag b1500 and b2581 on Windows ARM64 PC

If you want any chance of getting this fixed, do a git bisect to identify the exact commit that caused performance regression and notify the corresponding dev.

Performance decreated between tag b1500 and b2581 on Windows ARM64 PC

>I tried to do "git bisect" to find root reason for it, but there're huge patches added between tag b1500 and b2581. Download the model as the original weights and...