Awni Hannun
Awni Hannun
Sounds great! Let us know how we can help on the MLX side.
Yes please open an issue about it, it should be straightforward to get it working
> The quantized matmul with int8 is quite strict Just curious what you mean by that? What flexibility is missing?
Was there ever a command added to control the output buffering behavior? I'm running into the same issue after recently upgrading openmpi.
It worked for me to pass `--output :raw` to `mpirun`
This is really neat! I didn't look at the internals yet.. but one thing I'm wondering about is why the grid (and a few other parameters) gets specified when you...
I guess maybe a better question is what do you do if you want to use the same kernel with outputs with different shapes? Do you make a new `MetalKernel`...
Closes #1025
> I'm using a hash of the source and the template arguments to get the host_name so it won't recompile if you change the output_shapes, grid and/or threadgroup but have...
Same benchmark on an M2 Ultra ``` average time of Pytorch: 7.20261025428772 average time of MLX: 2.34059739112854 ```