Sasha Rush comments

Results 216 comments of


                                            Sasha Rush

Minor Nitpics, from an also rust newbie :)

yeah, I agree with all this. The names and coding conventions came directly from https://github.com/karpathy/llama2.c I just copied them over. I'll do another pass to make the names better and...

Some llama2 finetunes don't seem to work

Oh weird, for some reason they added 2 additional word tokens. 2 * 5120 * 2 * 4 bytes I'll take them out for now, and think about a way...

Quick review

Thanks! This is all really helpful. String stuff tripped me up a bit. If you have a minute can you explain 6 to me. How do I check that it...

Quick Code Review: Auto-vectorization

Amazing, that's really helpful to know. Thanks for pointing it out. Do you plan on continuing to work on this? Was planning on moving on, but now I'm kind of...

Quick Code Review: Auto-vectorization

Nice. This bumped me up from 0.92 t/s to 1.02 t/2 on llama2 7B.

Quick Code Review: Auto-vectorization

Nice, I will try to catch up on your code. Some of the HF people recommended trying to do GPTQ inference (quant-full mat-vec). Which version are you doing?

Quick Code Review: Auto-vectorization

hi! I saw that you are also a maintainer of Triton and worked on the AoT compiler. I'm playing around with trying to set this project up to use Triton...

Quick Code Review: Auto-vectorization

Thanks, once I got it running it was fast, but then when I tried to further optimize the Triton code, the rust version went out of sync with the python...

nice work, some questions

It's using Rayon for data parallel matrix vector mult, but no other libraries. See the rust library `Candle` which has a full implementation with matrix mults. Was thinking I would...

nice work, some questions

1. Should work with fine with arm. But currently it is f32 only. (Note though this is CPU no gpu support) Have to think about how to add f16. 2....