Eric Buehler

Results 543 comments of Eric Buehler

@Jeadie I reran the check and it passed. Please let me know when this is ready for review!

@Jeadie I noticed there are some outstanding merge conflicts, could you please resolve those? Thanks.

@JRRudy1 is there still active work on this PR?

Sounds good @JRRudy1! We're looking to use 0.22 in HF Tokenizers, so this would be super helpful. @davidhewitt would you possibly be able to review this?

> So have I (new baby), so there's a few things blocked on me which I need to unstack first, however if we don't hear from @adamreichold within maybe a...

@hiive I think I may have a solution for your case. On Metal, our preallocation for a large PagedAttention KV cache can cause slowdowns for some reason. I would recommend...

> Do you have any idea on what the cause is? Even a general rough idea would be good enough here. Yeah, if you allocate over the ["recommended max working...

Hi @dinerburger ! After some recent work in KV cache, I think we have the infrastructure now for this! I'll take a look again and will probably merge some initial...

I'm considering two options. The 8 bit cache using FP8 might be easier to implement. - 4-bit cache: something similar to what exllamav2 does [here](https://github.com/turboderp/exllamav2/commit/324404ebe4e3c4dd0447ffc1290c312de1df02be#diff-144d2aca644ed440b9a057a45d082eb7015da25396352f76b2e84e66f6e0a57b), where we apply a Hadamard...