Philip Turner comments

Results 340 comments of


                                            Philip Turner

M3 Performance

That 1024 is just to fool the compiler. It doesn't allocate any blocks. You can increase the accumulator size by trying 64x64, 80x80, or 96x96 instead of 48x48. > I'm...

M3 Performance

Out of curiosity, what is MPS performance for Float16?

M3 Performance

Also if it's just 1 split, you could fine-tune (40x40, 56x56, 72x72). I think I know at least how to get equal performance to MPS, if you use a very...

M3 Performance

I doubt it. Also, M3 is not that hard, compared to M1. I just haven’t had any factor motivating me to patch up MFA yet.

FP32 is the first step before delving into more advanced types. Such as truncated FP32 (brain float) and FP32 with no dynamic range (half precision). https://gist.github.com/philipturner/3bda14e876a635e73745c42f2eb240c8 Optimal block sizes: 32x32x32...

Philip Turner

M3 Performance

M3 Performance

M3 Performance

M3 Performance

M3 Performance

M3 Performance

M3 Performance

Guidelines for modifying H3 with metal-flash-attention

Guidelines for modifying H3 with metal-flash-attention

Undefined symbols error