Clemens Akens comments

Results 9 comments of


                                            Clemens Akens

trafficstars

Better handle architecture

When using setup-zig on macOS ARM runners, it currently downloads the x64 (Intel) version of Zig by default. I believe the changes proposed in this PR should resolve the issue...

Significant Quality Degradation with q8 Quantization in Small Models

I haven't tried it myself, but maybe it helps to reduce the group size and thus increase the accuracy? It is currently fixed at 64: https://github.com/karpathy/llama2.c/blob/d9862069e7ef665fe6309e3c17398ded2f121bf5/export.py#L182 A group size of...

runomp on Mac M1 Max is slower than runfast

I recently incorporated multithreading into my [Zig port](https://github.com/clebert/llama2.zig) of this project and made some relevant findings. Essentially, the overhead associated with initializing and terminating multiple threads per matrix-vector multiplication can...

runomp on Mac M1 Max is slower than runfast

Yes in single-threaded mode. But this was my best ever measured run. Normally, it fluctuates between 680 and 700 tokens per second. Why there is this big variance, I don't...

runomp on Mac M1 Max is slower than runfast

The use of `@Vector` (SIMD) had the biggest effect. Without SIMD, you couldn't get anywhere near these results. Aligning the vectors to the cache line, on the other hand, did...

runomp on Mac M1 Max is slower than runfast

I forgot to mention one important optimization: `@setFloatMode(.Optimized)` It has about the same effect as setting `-ffast-math` in the C version.

runomp on Mac M1 Max is slower than runfast

@tairov I have conducted extensive benchmarks with my improved Zig implementation, using an Apple M2 Pro equipped with 12 cores and an Apple M1 Pro equipped with 10 cores. [**Benchmark...

runomp on Mac M1 Max is slower than runfast

> Hey @clebert , I really appreciate you taking the time to improve `llama2.zig`, I think the ziglang community & maintainers might get valuable insights from it. Thank you 👍🏻...

Support for one pin mode

Thanks for the suggestion, I will try it out.