Daniel Yeh

Results 4 comments of Daniel Yeh

Hi @awni I’ve finished the [cblas_cgemm integration](https://github.com/ml-explore/mlx/commit/ee1243d5c6a9232c9e4fd287d1550e2b383dbf0d) and am currently stuck on the Metal GPU part. I implemented a[ complex64_t-specialized BlockMMA](https://github.com/ml-explore/mlx/blob/afa60e4f9f76f93b9c5a324fe9a908e58e67d844/mlx/backend/metal/kernels/steel/gemm/mma.h#L745-L887) (with four MMAs inside) to make it easy to...

Thanks for your reply. Yes, _check_dsk has a side effect of checking an overlapping key in the task graph. But it's the nature of this computation. Task Graph and Details...

Yes, I think `masked_gather` would be a nice way to support #246.

I can’t reproduce the slowdown locally. I think `get_peak_memory` only gives you memory usage of metal device. Likely cause is Python GC overhead from many short‑lived YAML objects. How about...